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Abstract 

Toxic  algae  are  a  growing  concern  in  the  marine  environment.  One  unique 
marine  diatom,  Pseudo-nitzschia  multiseries  (Hasle)  Hasle,  produces  the  the  neurotoxin 
domoic  acid,  which  is  the  cause  of  amnesic  shellfish  poisoning.  The  molecular 
characterization  of  this  organism  has  been  limited  to  date.  Therefore,  the  focus  of  this 
thesis  was  to  identify  and  initiate  characterization  of  actively  expressed  genes  that  control 
cell  growth  and  physiology  in  P.  multiseries,  with  the  specific  goal  of  identifying  genes 
that  may  play  a  significant  role  in  toxin  production. 

The  first  step  in  gene  discovery  was  to  establish  a  complementary  DNA  (cDNA) 
library  and  a  database  of  expressed  sequence  tags  (ESTs)  for  P.  multiseries.  2552 
cDNAs  were  sequenced,  generating  a  set  of  1955  unique  contigs,  of  which  21% 
demonstrated  significant  similarity  with  known  protein  coding  sequences.  Among  the 
genes  identified  by  sequence  similarity  were  several  involved  in  photosynthetic 
pathways,  including  fucoxanthin-chlorophyll  a/c  light  harvesting  protein  and  a 
C4-specific  pyruvate,  orthophosphate  dikinase.  Several  genes  that  may  be  involved  in 
domoic  acid  synthesis  were  also  revealed  through  sequence  similarity,  for  example, 
glutamate  dehydrogenase  and  5-oxo-L-prolinase.  In  addition,  the  identification  of 
sequences  that  appear  novel  to  Pseudo-nitzschia  may  provide  insight  into  unique  aspects 
of  Pseudo-nitzschia  biology,  such  as  toxin  production. 

Genes  whose  expression  patterns  were  correlated  with  toxin  production  were 
identified  by  hybridization  to  a  microarray  manufactured  from  5376  cDNAs. 

121  cDNAs,  representing  12  unique  cDNA  contigs  or  non-redundant  cDNAs,  showed 
significantly  increased  expression  levels  in  P.  multiseries  cell  populations  that  were 
actively  producing  toxin.  The  up-regulated  transcripts  included  cDNAs  with  sequence 
similarity  to  3-carboxymuconate  cyclase,  phosphoenolpyruvate  carboxykinase,  an  amino 
acid  transporter,  a  small  heat  shock  protein,  a  long-chain  fatty  acid  Co-A  ligase,  and  an 
aldo/keto  reductase.  These  results  provide  a  framework  for  investigating  the  control  of 
toxin  production  in  P.  multiseries.  These  transcripts  may  also  be  useful  in  ecological 
field  studies  in  which  they  may  serve  as  signatures  of  toxin  production.  Prospects  for 
further  application  of  molecular  genetic  technology  to  the  understanding  of  the 
physiology  and  ecology  of  P.  multiseries  is  discussed. 
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Toxic  algae  have  become  a  growing  concern  in  the  study  of  the  marine  environment 
during  the  past  few  decades  (Anderson,  1994;  Sellner,  2003).  Pseudo-nitzschia 
multiseries  is  a  particularly  interesting  toxin-producing  alga,  as  it  represents  one  of  the 
only  known  species  to  produce  a  phycotoxin  within  the  division  Bacillariophyta  (Bates, 
1998).  The  present  study  focused  on  the  molecular  characterization  of  P.  multiseries, 
with  special  interest  in  both  its  role  as  a  harmful  alga  and  as  a  member  of  the  diatom 
community. 

Diatoms:  Diatoms  (Bacillariophyta)  represent  an  important  group  of  bloom-forming 

eukaryotic  phytoplankton  (Mann  and  Droop,  1996).  They  play  a  major  role  in  global 
carbon  cycling  and  nutrient  cycling  in  the  marine  environment  (Werner,  1977;  Field  et 
al.,  1998;  Mann,  1999).  One  distinguishing  characteristic  of  diatoms  is  their  intricate 
siliceous  cell  walls,  or  ffustules.  Due  to  the  uptake  and  processing  of  silicon  that  is 
required  to  produce  these  ffustules,  diatoms  play  a  key  role  in  the  biogeochemical  cycling 
of  silicon  and  are  responsible  for  the  production  of  240  x  10  moles  of  silica  per  year 
(Treguer  et  al.,  1995). 

Toxin-producing  diatoms  appear  to  be  limited  to  twelve  species  that  produce  the 
neurotoxin,  domoic  acid  (DA):  Amphora  coffeaformis ,  Pseudo-nitzschia  multiseries, 

P.  pseudodelicatissima,  P.  calliantha,  P.  australis,  P.  seriata,  P.  fraudulenta, 

P.  delicatissima,  P.  turgidula,  P.  multistriata,  P.  pungens  and  Nitzschia  navis-varingica 
(Bates  et  al.,  1998;  Bates,  2000).  The  existence  of  non-toxic  strains  of  P.  multiseries, 

P.  seriata,  P.  australis,  P.  delicatissima,  P.  calliantha  and  P.  pseudodelicatissima,  and  of 
toxic  strains  of  the  generally  non-toxic  P.  pungens,  suggests  genetic  variability  among 
strains  of  the  Pseudo-nitzschia  species  and  differences  in  regulatory  factors  controlling 
DA  production. 

Molecular  characterization  of  the  toxin-producing  diatom  species  has  been 
limited.  Ribosomal  RNA  genes  have  been  characterized  for  phylogeny  and  field 
identification  studies  (e.g.  Hasle  1994,  1995;  Scholin,  1994;  Lundholm,  2002). 

Molecular  phylogeny  utilizing  ribosomal  RNA  has  contributed  to  changes  in 
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nomenclature  of  the  genus  Pseudo-nitzschia  over  the  last  decade  (Hasle,  1994,  1995; 
Hasle  et  al.,  1996).  For  example,  once  considered  the  same  species,  P.  multiseries  was 
distinguished  from  P.  pungens  due  to  differences  in  morphology,  physiology  and  genetic 
structure  of  large-subunit  ribosomal  RNA.  Nine  DNA  microsatellite  markers  have 
recently  been  developed.  These  markers  have  been  used  in  field  studies  to  distinguish  and 
analyze  relationships  among  field  isolates  and  in  laboratory  mating  experiments  to 
demonstrate  Mendelian  inheritance  (Evans  et  al.,  2004). 

Among  the  toxin-producing  diatoms,  the  physiology  and  ecology  of  P.  multiseries 
has  been  studied  the  most  extensively.  Therefore,  this  species  was  selected  as  a  model  to 
investigate  genes  associated  with  toxin  production  and  overall  growth  and  physiology 
within  this  group  of  marine  algae. 

Pseudo-nitzschia  multiseries:  Pseudo-nitzschia  multiseries  is  a  species  of  pennate 

diatom  that  produces  the  neurotoxin  domoic  acid  (DA).  Production  of  phycotoxins  by 
diatoms  was  unknown  prior  to  1987,  when  P.  multiseries  first  bloomed  in  Cardigan  Bay, 
Prince  Edward  Island,  Canada  (Bates  et  al.,  1989).  This  initial  bloom  caused  amnesic 
shellfish  poisoning  (ASP)  in  humans  who  had  consumed  contaminated  blue  mussels 
(Mytilus  edulis).  DA  was  isolated  from  extracts  of  mussels  that  had  been  feeding  on  P. 
multiseries  and  subsequent  studies  verified  the  production  of  DA  by  P.  multiseries  and 
other  members  of  the  genus  Pseudo-nitzschia  (Wright  et  al.,  1989).  Since  1987, 
environmental  factors  influencing  DA  production  in  Pseudo-nitzschia  spp.  have  been 
investigated,  especially  in  P.  multiseries  (Bates,  Bates  et  al.,  1998).  However,  the 
mechanism  of  DA  production  and  genetic  regulation  is  still  not  clearly  understood. 

DA  is  a  water-soluble  tricarboxylic  amino  acid  with  a  molecular  weight  of  3 1 1 
that  includes  a  proline-like  ring  containing  an  isoprenoid  and  a  carboxymethyl  side  chain 
(Figure  1-1)  (Takemoto  and  Daigo,  1958).  An  analog  of  the  neurotransmitters  glutamate 
and  kainate,  DA  predominantly  binds  to  a  kainate  sub-type  of  ionotropic  glutamate 
receptor  in  the  central  nervous  system  (Hampson  and  Manalo,  1998;  Berman  et  al., 

2002.)  DA  has  a  binding  affinity  100  times  greater  than  glutamate  and  three  times 
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Figure  1-1:  Domoic  acid  and  Structural  Analogs 
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greater  than  kainate.  The  high  affinity  binding  of  DA  to  glutamate  receptors  leads  to  an 
influx  of  calcium  ions  in  neurons  expressing  glutamate  receptors,  which  in  turn  leads  to 
massive  depolarization  of  these  neurons,  neuronal  swelling,  and  ultimately  cell  death 
(Stewart  et  al.,  1990;  Olney,  1994).  DA  toxicity  most  severely  affects  hippocampal  nerve 
cells  associated  with  memory  retention,  suggesting  a  functional  basis  for  the  memory  loss 
of  patients  diagnosed  with  ASP  due  to  DA  produced  by  P.  multiseries  (Todd,  1993). 

Characterization  of  the  biosynthetic  pathways  leading  to  DA  synthesis  has  been 
limited.  13C-  and  14C-  labeling  studies  suggest  a  model  involving  condensation  of  an 
activated  glutamate  derivative  from  the  citric  acid  cyle  with  an  isoprenoid  chain,  such  as 
geranyl  pyrophosphate,  and  subsequent  cyclization  as  a  possible  mechanism  to  generate 
DA  (Figure  1-2)  (Douglas  et  al.,  1992;  Ramsey  et  al.,  1998).  In  separate  studies,  Smith 
and  colleagues  have  focused  on  the  relationship  of  proline  to  DA  metabolism,  by 
measuring  amino  acid  levels  to  show  that  proline  and  DA  levels  are  inversely  correlated. 
They  suggest  that  proline  is  an  upstream  precursor  to  DA,  or  that  DA  substitutes  for  the 
physiological  function  of  proline.  A  proposed  model  showing  the  hypothesized 
derivation  of  3-hydroxy-glutamate  from  proline  metabolism,  which  would  then  lead  to 
DA  production,  based  on  the  suggestion  of  Smith  et.  al.  (2001)  is  shown  in  Figure  1-3. 

Growth  Dynamics  and  DA  production:  Pseudo-nitzschia  multiseries  growth  rates  range 
from  0.21  to  1 .20  divisions  per  day  during  the  exponential  phase  in  batch  culture,  while 
cell  yields  average  between  100,000-300,000  cells/mL,  depending  on  nutrient  conditions. 
Numerous  studies  on  the  growth  of  P.  multiseries  in  culture  have  shown  that  DA 
production  does  not  begin  until  early  stationary  phase,  i.e.  toxin  is  not  typically  produced 
during  the  exponential  growth  phase  (Bates  et  al.,  1989,  1991,  1993,  1995;  Subba  Rao  et 
al.,  1990;  Reap,  1991;  Douglas  and  Bates,  1992;  Douglas  et  al.,  1993;  Lewis  et  al.,  1993). 
In  these  studies,  cellular  DA  concentrations  reached  a  peak  about  one  week  after  the 
beginning  of  the  stationary  phase  in  batch  culture,  while  the  amount  of  DA  released  into 
the  culture  medium  continued  to  increase  throughout  the  mid-  and  late-  stationary  phases 
(Bates  et  al.,  1991;  Pan  et  al.,  1996).  In  other  studies  that  exposed  P.  multiseries  to 
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Figure  1-2:  Proposed  pathway  for  Domoic  Acid  Biosynthesis,  Ramsey  et  al.,  1998 
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Figure  1-3:  Proposed  pathway  for  Domoic  Acid  Biosynthesis,  Smith  et  al..  2001 
PROLINE  METABOLISM  MAY  GENERATE  CRITICAL  PRECURSOR  TO  DA 
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conditions  in  which  cell  division  during  mid-exponential  phase  was  slowed  relative  to 
normal,  cells  did  produce  low  levels  of  toxin  (Bates  et  al.,  1993;  Pan  et  al.,  1996). 
Therefore,  toxin  production  appears  to  be  linked  to  stages  in  the  cell  cycle  when  cell 
division  has  stopped  or  cells  are  arrested  as  the  division  rate  of  the  entire  population  of 
cells  slows  due  to  some  limiting  factor  (Bates,  1998). 

DA  production  by  P.  multiseries  has  been  associated  with  physiological  stress 
caused  by  silicon  (Si)  limitation.  Diatoms  require  Si  for  DNA  synthesis  as  well  as  for 
frustule  construction;  Si  may  therefore  become  a  limiting  factor.  Bates  et  al.  (1991)  and 
Pan  et  al.  (1996)  both  showed  that  the  production  of  DA  by  P.  multiseries  was  inversely 
correlated  with  ambient  silicate  concentration  and  that  DA  accumulated  in  cells  when  the 
division  rate  declined  due  to  depletion  of  Si.  Brzezinski  et  al.  (1990)  have  shown  that  Si 
limitation  in  diatoms  alters  the  normal  progression  of  cells  through  the  cell  cycle  (Gl,  S, 
G2,  M)  by  arresting  cells  at  the  Gl/S  boundary  and  in  the  G2  or  M  phases.  DA 
production  in  P.  multiseries  appears  to  begin  at  the  end  of  Gl  or  during  the  G2  phase  of 
the  cell  cycle,  which  correlates  with  cell  cycle  arrest  due  to  Si  limitation  (Pan  et  al., 

1996;  Bates  and  Richard,  1996).  Si  limitation  may  impede  the  progression  of  the  cell 
cycle  by  interfering  with  DNA  synthesis.  In  separate  studies,  Sullivan  and  Volcani 
(1973)  showed  that  cessation  of  DNA  synthesis  by  Si  starvation  was  caused  by  a 
decrease  in  DNA  polymerase  and  thymidilate  (TMP)-kinase  activity,  but  not  by  a  lack  of 
energy  or  precursors.  DNA  polymerases  A  and  D  are  only  synthesized  in  the  presence  of 
Si,  whereas  at  least  15  other  proteins  are  formed  only  in  the  absence  of  Si.  These  results 
suggest  that  Si  levels  affect  regulation  of  gene  expression  in  diatoms  (Pan  et  al.,  1998). 

Phosphorous  (P)  limitation  has  also  been  implicated  as  a  trigger  for  DA 
production  (Bates  et  al.,  1991;  Pan  et  al.,  1996,  1998).  Toxin  production  was  induced  in 
batch  culture  when  phosphate  supply  was  low  (<1  pM)  and  alkaline  phosphatase  activity 
(an  indicator  of  P-limitation)  was  high.  In  addition,  synthesis  of  DA  was  depressed  by 
the  addition  of  inorganic  P,  which  stimulated  cell  growth.  In  contrast  to  Si  and  P 
limitation,  nitrogen  (N)  limitation  restricts  toxin  production  due  to  insufficient  levels  of 
free  N  to  synthesize  DA.  In  one  study  where  P.  multiseries  was  N-limited  and  failed  to 
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produce  DA,  addition  of  nitrate  subsequently  stimulated  DA  production  (Bates  et  al., 
1991). 

An  inverse  relationship  has  been  demonstrated  between  DA  production  and 
growth  rate  of  P.  multiseries  (Bates  et  al.,  1996;  Pan  et  al.,  1996).  This  relationship  has 
been  attributed  to  the  availability  of  high-energy  intermediates  necessary  for  DA 
synthesis,  which  varies  over  the  growth  cycle  of  P.  multiseries  (Pan  et  al.,  1996, 1998). 
During  exponential  phase,  cells  are  actively  growing  and  less  ATP  is  available  for  DA 
synthesis,  whereas  at  stationary  phase,  carbon  assimilation  is  reduced  so  available  ATP 
may  be  used  to  support  DA  production  (Pan  et  al.,  1996). 

Few  laboratory  studies  have  been  completed  in  species  other  than  P.  multiseries. 
In  P.  seriata,  the  pattern  of  DA  production  was  similar  to  that  of  P.  multiseries,  with 
minimal  toxin  production  during  exponential  phase  and  increased  production  throughout 
stationary  phase  (Lundholm  et  al,  1994;  Fehling  et  al.,  2004).  In  contrast,  toxin 
production  in  P.  australis  began  during  exponential  phase  and  remained  fairly  constant 
during  stationary  phase  (Garrison  et  al.,  1992). 

Axenic  vs.  nonaxenic  cultures:  Several  bacterial  isolates  have  been  shown  to  enhance  DA 
production  by  P.  multiseries  (Bates  et  al.,  1995).  While  P.  multiseries  can  produce  DA  in 
axenic  cultures  (Douglas  and  Bates,  1992;  Douglas  et  al.,  1993),  reintroduction  of 
bacteria  to  axenic  cultures  resulted  in  increased  DA  production  by  2  to  115  fold  (Bates  et 
al.,  1995).  There  is  no  evidence  that  bacteria  in  these  cultures  are  capable  of  autonomous 
DA  production  (Gaudet,  2001;  Bates  et  al.,  2004),  and  the  mechanism  for  enhanced  DA 
production  due  to  bacterial  presence  is  uncertain.  Bacterial  numbers  increase  after  the 
beginning  of  stationary  phase  of  P.  multiseries,  corresponding  with  increased  toxin 
production.  However,  axenic  cultures  also  exhibit  the  characteristic  increase  in  DA 
production  during  stationary  phase.  One  suggested  hypothesis  for  enhanced  DA 
production  in  non-axenic  cultures  vs.  axenic  cultures  is  that  the  bacteria  produce  or 
regenerate  precursor  molecules  necessary  for  DA  production,  rather  than  directly 
contributing  to  DA  synthesis  (Douglas  and  Bates,  1992;  Bates,  1998). 
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Asexual  vs.  sexual  reproduction:  As  a  diatom,  P.  multiseries  demonstrates  a  decrease  in 
cell  size  during  vegetative  division.  The  mean  cell  length  of  a  population  of  any  diatom 
decreases  over  successive  generations.  Diatom  frustules  are  composed  of  two  valves  that 
fit  together  like  a  petri  dish,  with  one  larger  valve  (epitheca)  overlapping  the  smaller 
valve  (hypotheca).  Therefore,  each  mitotic  division  results  in  the  formation  of  two 
differently  sized  daughter  cells,  one  that  is  the  same  size  as  the  parent  and  one  that  is 
slightly  smaller  (Round  et  al.,  1990).  In  P.  multiseries,  an  observed  decrease  in  the 
capability  to  produce  DA  may  coincide  with  the  decrease  in  cell  length  (Bates  et  al., 
1998),  although  not  all  isolates  necessarily  follow  this  trend  (Dr.  Stephen  Bates,  personal 
communication).  Interestingly,  cell  deformities  also  tend  to  appear  in  P.  multiseries  cells 
after  a  certain  period  in  culture  (Villac,  1996;  Bates  et  al.,  1998). 

Sexual  reproduction  restores  the  original,  larger  cell  dimensions  of  P.  multiseries 
and  also  appears  to  restore  DA  production  in  cultures  that  had  experienced  a  reduction  in 
DA  production  over  time.  Davidovich  and  Bates  (1998)  described  the  sexual 
reproductive  cycles  of  P.  multiseries  as  follows:  pairing  of  parent  cells  of  opposite 
mating  types,  gamete  production,  fusion  of  gametes  to  form  zygotes,  enlargement  of 
auxospores,  and  formation  of  long,  initial  cells  that  usually  produced  higher  levels  of  DA 
than  the  original  parent  cells. 

Molecular  Technology:  While  a  considerable  amount  of  research  has  been 

completed  to  investigate  the  biology  of  P.  multiseries,  the  molecular  characterization  of 
this  organism  has  been  limited  up  to  now.  Further  knowledge  of  the  pathways  that 
control  the  growth  and  physiology  of  P.  multiseries,  including  toxin  production,  requires 
characterization  of  the  genes  that  govern  the  regulation  of  these  pathways.  Therefore, 
this  thesis  project  employed  molecular  techniques  to  identify  and  initiate  characterization 
of  actively  expressed  genes  in  P.  multiseries. 

Only  a  subset  of  all  encoded  genes  is  expressed  in  any  given  cell,  and  the  levels 
and  timing  of  gene  expression  determine  the  fate  of  individual  cells.  The  central  dogma 
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of  molecular  genetics  describes  gene  expression  as  the  process  of  DNA  transcription  into 
messenger  RNA  (mRNA),  which  is  subsequently  translated  into  functional  protein. 

Since  gene  expression  is  initiated  at  the  transcriptional  level,  gene  discovery  has  often 
focused  on  studying  gene  expression  by  measuring  mRNAs.  Comparing  the  amount  of 
specific  mRNAs  between  two  samples  provides  a  mechanism  to  screen  for  genes  that  are 
turned  on  or  off  under  defined  physiological  or  environmental  conditions. 

Several  techniques  have  been  developed  to  analyze  differentially  expressed  genes 
between  two  or  more  populations  of  nucleic  acids.  These  comparative  techniques  include 
subtractive  hybridization  (Sagerstrom  et  al.,  1997)  and  microarray  technology  (Brown, 
1999;  Schena  et  al.,  1995,  1996;  Shalon  et  al.,  1996).  In  the  subtractive  hybridization 
approach,  mRNA  from  the  first  cell  type  is  converted  to  single-stranded  complementary 
DNAs  (ss  cDNAs),  which  are  then  hybridized  to  an  excess  of  all  the  mRNAs  that  are 
expressed  in  the  second  cell  type.  Genes  that  are  expressed  in  both  cell  types  will  form 
cDNA/mRNA  duplexes,  while  cDNA  that  is  expressed  in  only  the  first  cell  type  will  be 
single-stranded  and  can  then  be  separated  from  the  duplexes  by  a  number  of  methods. 
Subtractive  hybridization  is  a  relatively  simple  technique,  which  has  been  particularly 
useful  in  the  identification  of  single  significant  mRNAs  such  as  the  isolation  of  T-cell 
receptor  mRNAs  by  comparing  gene  expression  profiles  between  T  and  B  cells  (Hedrick 
et  al.,1984)  and  the  identification  of  the  myoD  gene,  a  master  regulator  of  muscle 
differentiation  (Davis  et  al.,  1987).  Within  marine  ecology,  suppressive,  subtractive 
hybridization  is  currently  being  used  to  identify  genes  that  are  up-regulated  in  fish 
exposed  to  various  environmental  contaminants  (Tsoi,  2004).  Alternative  protocols  to 
standard  subtractive  hybridization  to  identify  differentially  expressed  transcripts  include 
representational  difference  analysis  (RDA)  and  suppression  PCR,  which  are  PCR-based 
selection  techniques.  While  all  of  these  techniques  provide  a  method  for  discovery  of 
differentially  expressed  genes  with  high  sensitivity,  they  do  not  allow  the  survey  of  a 
broad  number  of  genes  in  a  high-throughput  mode. 
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Microarrays  allow  the  monitoring  of  thousands  of  genes  in  parallel.  The  first  step 
in  construction  of  a  cDNA  microarray  is  to  create  a  cDNA  library  from  reverse 
transcription  of  mRNAs  in  cells  or  tissues  of  interest.  Frequently,  a  subset  of  the  cDNAs 
will  be  sequenced  to  begin  to  identify  genes  of  interest  and  to  verify  the  quality  of  the 
library.  cDNA  arrays  are  constructed  by  depositing  thousands  of  amplified  cDNAs  onto 
glass  microscope  slides,  with  each  cDNA  represented  as  an  independent  spot  on  the 
array.  The  cDNA  microarray  is  then  hybridized  to  fluorescently  labeled  cDNA  prepared 
by  reverse  transcription  of  mRNA  isolated  from  two  different  populations  of  interest. 
Competitive  hybridization  of  two  samples  labeled  separately  with  Cy3  and  Cy5,  allows 
the  ratio  of  mRNA  abundance  between  the  two  samples  to  be  compared  for  each 
individual  cDNA  on  the  microarray  (Brown  and  Botstein,  1999).  Microarray  analysis 
applied  within  the  field  of  phytoplankton  ecology  offers  the  potential  to  discover  genes 
involved  in  ecologically  relevant  processes,  such  as  toxin  biosynthesis,  population 
growth  and  bloom  dynamics,  photosynthesis,  and  nutrient  cycling. 

Microarray  technology  has  proven  to  be  a  powerful  tool  for  gene  discovery 
programs  in  a  wide  range  of  organisms.  In  human  cancer  genetics,  for  example, 
microarray  studies  have  led  to  the  investigation  of  new  approaches  to  diagnosis  and  drug 
therapy  (Ochs  and  Goodwin,  2003).  Microarrays  have  also  been  extremely  useful  in 
characterizing  the  transcriptional  control  mechanisms  which  govern  physiological 
response  in  S.  cerevesiae  (Eisen  et  al.,  1998;  Spellman  et  al.,  1998;  Gasch  et  al.,  2000). 
The  success  of  mining  large  gene  expression  data  sets  and  the  potential  for  the 
information  to  be  useful  beyond  the  initial  goals  of  this  project  suggested  the  application 
of  microarray  technology  to  the  present  study  aimed  at  the  identification  of  genes  that  are 
differentially  expressed  in  P.  multiseries.  Advantages  of  DNA  microarrays  include  1) 
thousands  of  transcripts  can  be  analyzed  simultaneously  2)  arrays  allow  simultaneous 
comparison  of  multiple  samples,  3)  a  relatively  small  amount  of  starting  material  is 
required,  4)  groups  of  genes  with  parallel  expression  patterns  can  be  identified  5)  the 
method  is  fast,  efficient,  and  accurate,  and  6)  arrays  can  be  useful  for  obtaining  markers 
of  specific  physiological  states. 
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cDNA  microarrays  have  recently  been  applied  to  ecological  studies.  Genes 
associated  with  response  to  environmental  variation  and  stress  have  been  identified  using 
microarrays  in  a  number  of  studies.  For  example,  one  study  identified  cyanobacterial 
genes  that  were  differentially  expressed  under  conditions  of  high  light  acclimation, 
carbon  dioxide  fixation  and  photoprotection  (Hihara  et  al.,  2001).  Other  studies  selected 
for  cyanobacterial  genes  that  responded  rapidly  to  different  wavelengths  and  intensities 
of  irradiance  (Huang  et  al.  2002;  Gill  et  al.,  2002).  In  the  dinoflagellate  P.  lunula , 
microarray  analysis  has  been  successfully  used  to  identify  genes  which  are  differentially 
expressed  in  relation  circadian  rhythm  (Okamoto  and  Hastings,  2003).  Microarray 
studies  have  also  been  applied  directly  to  the  analysis  of  environmental  samples.  For 
example,  Taroncher-Oldenburg  et  al.  (2003)  addresses  detrification  in  the  Choptank 
River-Chesapeake  Bay  system  using  microarray  methodology.  Other  studies 
demonstrated  the  effectiveness  of  microarray  technology  to  monitor  nitrogen  cycling 
genes  in  environmental  samples  (Wu  et  al.,  2001). 
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Summary: 


While  a  considerable  amount  of  research  has  been  dedicated  to  understanding  the 
biology  of  P.  multiseries,  many  questions  remain  unanswered.  A  limitation  to  further 
knowledge  of  the  biochemical  pathways  that  control  Pseudo-nitzschia  physiology  and 
growth,  including  domoic  acid  production,  is  posed  by  the  lack  of  understanding  of  the 
molecular  biology  of  this  marine  diatom.  In  general,  diatoms  have  not  received  the  same 
attention  within  the  field  of  molecular  biology  that  they  have  received  in  the  fields  of 
ecology  and  marine  biology.  However,  the  past  few  years  have  seen  encouraging 
developments  in  the  area  of  diatom  genomics.  Whole  genome  sequencing  has  recently 
been  completed  for  the  non-toxic,  centric  marine  diatom  Thalassiosira  pseudonana 
(Armbrust  et  al.,  University  of  Washington,  and  US  Department  of  Energy  Joint  Genome 
Institute/  In  addition,  large-scale  EST  projects  are  currently  being  executed  for  T. 
pseudonana  (Hildebrand  et  al.,  Scripps  Institute  of  Oceanography,  and  US  Dept  of 
Energy  Joint  Genome  Institute),  and  the  non-toxic,  pennate  diatom  Phaeodactylum 
tricomutum  (Chris  Bowler,  Laboratory  of  Molecular  Plant  Biology,  Stazione  Zoologica). 

The  goals  of  this  study  were  to  establish  a  cDNA  library  and  EST  database  for  the 
toxic,  pennate  diatom  Pseudo-nitzschia  multiseries  and  to  screen  for  differentially 
expressed  genes  using  microarray  technology.  This  approach  was  selected  to  identify 
and  initiate  characterization  of  genes  associated  with  toxin  production  and  the  regulation 
of  growth  and  physiology  in  this  organism. 
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Chapter  II 


Pseudo-nitzschia  multiseries  cDNA  Library  and  Expressed  Sequence  Tag  Analysis 
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Abstract: 


A  complementary  DNA  (cDNA)  library  and  expressed  sequence  tag  (EST) 
database  were  constructed  to  identify  and  initiate  characterization  of  actively  expressed 
genes  in  the  toxic  marine  diatom,  Pseudo-nitzschia  multiseries.  A  set  of  3872  ESTs  was 
generated  by  sequencing  of  2552  randomly  picked  cDNA  clones.  1320  cDNAs  were 
sequenced  in  both  the  3’  and  5’  directions,  while  1232  cDNAs  were  sequenced  in  either 
the  3’  or  5’  direction.  The  ESTs  were  assembled  into  1955  non-redundant  contigs,  of 
which  21%  demonstrated  significant  similarity  with  known  protein  coding  sequences. 

The  P.  multiseries  EST  database  included  highly  significant  matches  with 
sequences  from  all  of  the  major  taxonomic  groups  described  within  the  universal 
phylogenetic  tree.  While  some  matches  undoubtedly  reflect  the  biases  of  the  sequence 
databases,  others  likely  reflect  the  evolutionary  history  of  diatoms.  Comparisons  of  the 
P.  multiseries  sequences  against  the  Thalassiosira  pseudonana  and  Phaeodactylum 
tricomutum  sequence  databases  proved  useful  in  identifying  diatom-specific  transcripts. 
In  addition,  the  discovery  of  numerous  transcripts  that  did  not  match  any  known 
sequences  in  the  public  databases,  nor  any  entry  in  the  T.  pseudonana  and  P.  tricomutum 
databases  offer  novel  sequences  that  will  potentially  help  to  elucidate  unique  aspects  of 
P.  multiseries  biology,  such  as  toxin  production. 

Key  enzymes  involved  in  C4  photosynthesis  were  revealed  though  sequence 
similarity,  including  a  C4-specific  pyruvate,  orthophosphate  dikinase,  a 
phosphoenolpyruvate  carboxykinase,  a  phosphoenolpyruvate  carboxylase,  and  a  pyruvate 
carboxylase.  The  existence  of  a  C4  pathway  in  diatoms  is  currently  under  debate,  so  this 
discovery  is  particularly  exciting,  as  it  suggests  the  possibility  of  a  C4  mechanism  in  P. 
multiseries.  Many  possible  candidate  genes  that  may  play  a  role  in  DA  biosynthesis  were 
also  revealed  through  sequence  similarity  to  known  protein  coding  sequences.  Examples 
include  enzymes  involved  in  glutamate  metabolism,  such  as  5-oxo-L-prolinase, 
acetylglutamate  kinase,  NAD-specific  glutamate  dehydrogenase,  and  N-acetylglutamate 
semialdehyde  dehydrogenase. 
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Introduction: 


Pseudo-nitzschia  multiseries  represents  an  ecologically  important  species  within 
the  marine  phytoplankton.  P.  multiseries  belongs  to  the  division  Bacillariophyta, 
unicellular  brown  algae  commonly  called  diatoms,  which  contribute  significantly  to 
global  carbon  fixation  (Hasle,  1995;  Falkowski,  1998;  Smetacek,  1999;  Kooistra,  2003). 
Diatoms  form  the  base  of  the  food  web  in  many  marine  environments  and  play  a  major 
role  in  nutrient  cycling,  especially  in  the  biotransformation  of  silicon  into  silica  during 
cell  wall  synthesis  (Treguer  et  al.,  1995).  P.  multiseries  is  distinctive  among  the  marine 
diatoms,  because  it  is  one  of  the  only  known  organisms  to  produce  the  neurotoxin, 
domoic  acid  (DA)  (Bates  et  al,  1989, 1998;  Todd,  1993).  DA  is  a  neuroexcitatory,  water 
soluble  amino  acid,  which  has  caused  poisonings  of  humans,  marine  mammals,  and  birds 
though  trophic  transfer  via  shellfish  consumption  (Bates,  1989;  Beltran,  1997;  Scholin  et 
al,  2000). 

Despite  its  ecological  importance,  the  molecular  characterization  of  P.  multiseries 
has  been  minimal.  This  is  illustrated  by  the  lack  of  protein-coding  sequences  available 
for  P.  multiseries  in  the  public  databases;  a  search  of  the  updated  NCBI  database  on 
August  8,  2004,  yielded  no  entries  for  P.  multiseries,  and  only  four  sequences  for  the 
related  genuses  Nitzschia  and  Pseudo-nitzschia  combined.  These  sequences  included 
malate  dehydrogenase,  which  was  characterized  in  the  marine  diatom  Nitzschia  alba 
(Yueh  et  al.,  1989).  The  other  three  sequences  likely  encode  6-phosphogluconate 
dehydrogenase,  cytochrome  oxidase,  and  a  delta-5  fatty  acid  desaturase,  based  on 
sequence  similarity  with  known  protein  coding  sequences  (Ehara  et  al.,  2000;  Andersson 
and  Roger,  2002).  The  lack  of  available  information  on  the  expressed  genome  of  P. 
multiseries  presents  a  limitation  to  further  understanding  the  metabolic  pathways  that 
control  cell  physiology,  including  toxin  production  and  growth.  Therefore,  a  genomic 
program  aimed  at  rapidly  cataloguing  actively  expressed  genes  by  sequencing 
complementary  DNAs  (cDNAs)  was  established  as  the  most  efficient  initial  approach  to 
directly  contribute  to  the  expansion  of  this  field  using  molecular  genomics. 


25 


The  sequencing  and  subsequent  identification  of  cDNAs  by  similarity  with  known 
protein-coding  sequences,  called  expressed  sequence  tag  (EST)  analysis,  has  become  an 
important  and  well-established  technique  for  gene  discovery  (Liang  et  al.,  2000;  Rudd, 
2003).  The  basic  strategy  for  EST  analysis  requires  construction  of  a  cDNA  library  from 
actively  expressed  mRNAs,  followed  by  selection  of  cDNA  clones  at  random  to  perform 
single,  automated  sequencing  from  one  or  both  ends  of  the  insert.  The  sequences  are  then 
categorized  based  on  similarity  to  sequences  deposited  in  public  databases.  This 
approach,  which  allows  rapid  assignment  of  function  to  a  suite  of  actively  expressed 
genes,  is  especially  useful  in  organisms  or  tissues  that  previously  have  had  little  genetic 
inquiry  or  exploration.  For  example,  the  first  application  of  high-throughput  sequencing 
of  cDNA  clones  allowed  the  isolation  and  subsequent  characterization  of  numerous 
transcripts  specific  to  the  human  brain  (Adams  et  al.,  1991). 

The  application  of  high-throughput  sequencing  of  cDNA  clones  to  investigate  the 
biology  of  marine  algae  was  introduced  relatively  recently  with  a  study  on  the  marine 
kelp,  Laminaria  digitata  (Crepineau  et  al.,  2000).  At  the  onset  of  this  thesis  project,  little 
information  was  available  on  diatom  genomics,  specifically.  However,  the  past  few  years 
have  resulted  in  an  exciting  opening  of  the  field.  A  sequencing  project  on  the  marine 
diatom,  Phaeodactylum  tricornutum ,  has  yielded  a  large  EST  dataset  (Scala  et  al.,  2002). 
While  the  original  report  described  1000  ESTs,  a  recent  review  of  the  updated  sequence 
data  available  on-line  revealed  over  12,000  ESTs  deposited  from  P.  tricornutum.  These 
ESTs  were  derived  from  the  5’  end  of  the  cDNAs  and  correspond  to  approximately  5100 
non-redundant  gene-oriented  clusters  (Chris  Bowler,  Laboratory  of  Molecular  Plant 
Biology,  Stazione  Zoologica;  http://avesthagen.  sznbowler.com).  The  P.  tricornutum 
project  involves  a  multi-facility,  interactive  group  that  supports  the  data  analysis  and 
gene  annotation  of  this  large  set  of  data  and  has  now  initiated  the  sequencing  of  the 
complete  genome  of  P.  tricornutum.  Concurrently,  an  EST  project  on  the  marine  diatom, 
Thalassiosira  pseudonana,  has  yielded  17,000  ESTs,  generated  from  8500  cDNAs 
sequenced  in  both  directions  (Hildebrand  et  al.,  Scripps  Institute  of  Oceanography,  and 
US  Dept  of  Energy  Joint  Genome  Institute;  http://avesthagen.sznbowler.com). 
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The  T.  pseudonana  EST  project  supports  the  annotation  of  genes  in  the  recently 
completed  whole  genome  sequencing  project  on  this  diatom  (Armbrust  et  al..  University 
of  Washington,  and  US  Department  of  Energy  Joint  Genome  Institute; 
http://genome.jgi-psf.org). 

In  the  present  study,  a  cDNA  library  and  EST  database  were  established  for  the 
toxic,  pennate  diatom,  Pseudo-nitzschia  multiseries.  This  project  has  currently  generated 
3872  ESTs,  corresponding  with  2552  cDNAs  that  were  assembled  into  1955  non- 
redundant  contigs.  The  sequence  information  presented  in  this  study  will  enable 
molecular  tools  to  be  further  exploited  in  order  to  advance  our  understanding  of  the 
metabolic  pathways  that  control  the  biology  of  P.  multiseries.  Comparative  studies 
across  the  three  diatom  genomes  should  prove  useful  to  the  study  of  functional  genomics 
and  phylogeny  among  the  diatoms. 
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Materials  and  Methods: 


Culture  conditions  and  RNA  extraction :  Pseudo-nitzschia  multiseries  clone 
CL-125  was  graciously  provided  by  Stephen  S.  Bates  (Department  of  Fisheries  and 
Oceans,  Gulf  Fisheries  Center,  Moncton,  NB,  Canada.)  This  clone  was  originally 
collected  from  Mill  River,  Prince  Edward  Island,  Canada,  on  September  21,  2000,  and 
isolated  on  September  23,  2000.  Clonal  cultures  of  CL-125  were  grown  in  0.2  pm 
filtered  seawater  enriched  with  f72  nutrients  (Guillard  and  Ryther,  1962).  Batch  cultures 
were  maintained  at  20  °C,  100  pE  m'2  s'1, 14: 10  h  LD  cycle.  Fifteen  L  of  culture  were 
grown  in  1 9-L  borosilicate  carboys;  the  cultures  were  aerated  using  aquarium  pumps  with 
sterile  tubing  and  were  constantly  mixed  with  magnetic  stirrers.  Cells  were  harvested 
during  late  exponential  to  mid-stationary  growth  phase,  under  predominantly  toxin- 
producing  conditions. 

An  extraction  protocol  which  led  to  the  consistent  isolation  of  high-quality  mRNA 
from  P.  multiseries  was  developed  through  the  evaluation  of  a  series  of  standard  protocols 
for  RNA  isolation  from  other  organisms.  The  final  RNA  extraction  protocol  given  below 
yielded  approximately  1  mg  of  total  RNA  and  10-12  pg  of  poly  (A)+  RNA  from  8  x  108  P. 
multiseries  cells.  P.  multiseries  cells  were  collected  by  centrifugation  for  1 5  minutes  at 
1  OOOg.  Total  RNA  was  extracted  by  homogenizing  the  cells  (Polytron)  in  TRIzol 
(Invitrogen,  Cat.  No.  15596-018),  which  relies  on  lysis  of  the  cells  in  the  presence  of  both 
phenol  and  guanidium  thiocyanate.  Following  homogenization,  insoluble  material  was 
removed  by  low  speed  centrifugation  of  the  samples,  which  increased  both  yield  and 
quality  of  the  resulting  total  RNA.  Precipitating  twice  with  salt  and  ethanol  also 
contributed  to  high  quality  total  RNA,  as  indicated  by  260/280  O.D.  ratios  and  gel 
electrophoresis.  Poly  (A)+  RNA  was  then  isolated  from  total  RNA  using  biotin-labeled 
oligo(dT)2o  probe  bound  to  steptavidin  magnetic  particles.  The  method  relies  on  poly  (A) 
residues  at  the  3'  ends  of  the  mRNAs  base-pairing  with  the  oligo(dT)2o  probe.  The  bound 
polyadenylated  RNA  was  magnetically  isolated  from  the  total  RNA  and  purified  (Roche, 
Cat.  No.  1741985).  Percent  recovery  of  mRNA  from  total  RNA  was  approximately  1-1.2%. 
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cDNA  library  (Figure  2-1):  First-strand  cDNA  was  prepared  from  poly  (A)  +  RNA  using 
Superscript  II,  NC-p7  (an  RNA  chaperone),  and  oligo  pd(TZ)  (an  oligo  (dT)  primer  with 
some  of  the  internal  thymidine  residues  replaced  with  3-nitropyrrole  to  minimize 
mispriming  to  internal  A-rich  sequences).  Double-stranded  cDNA  was  generated  using 
RNase  H,  E.  coli  DNA  polymerase  I,  and  E.  coli  ligase  (to  add  a  polymeric  tract  to  the 
first-strand  cDNA  for  initiation  of  second-strand  synthesis) .  The  ends  of  the  cDNA  were 
polished  with  T4  DNA  polymerase  and  GstXI  adaptors  were  ligated  to  the  cDNA  ends. 
The  cDNA  was  then  fractionated  on  sucrose  gradients.  Individual  size  fractions  were 
ligated  into  a  pUC-based  vector  and  transformed,  by  electroporation,  into  E.  coli  DH10B 
cells  (Das  et  al.,  2001).  Following  an  initial  library  plating,  individual  colonies  were 
picked  and  stored  at  -80°C  in  15%  glycerol  for  further  analysis.  Randomly  chosen  clones 
were  then  grown  overnight  in  1  mL  of  Terrific  Broth.  Plasmids  were  prepared  for 
sequencing  using  an  alkaline  lysis  method  modified  from  Sambrook  and  Russell  (2001). 
Alternatively,  insert  was  amplified  from  randomly  picked  clones  and  then  purified  using 
Millipore  multiscreens  (see  methods  and  materials  in  array  section.  Chapter  3). 

Sequence  Analysis :  Sequence  reactions  were  run  on  an  automated  DNA  sequencer,  ABI 
3700  with  dye  terminators.  The  majority  of  the  sequencing  reactions  were  run  in  the 
laboratory  of  Jerry  Pelletier,  Biochemistry  Department,  McGill  University.  However, 
selected  cDNAs  from  the  expression  studies  presented  in  the  next  chapter  were 
sequenced  at  ACGT,  Inc.  ESTs  were  edited  to  remove  low  quality  data,  poly  (A)  tails, 
and  vector  sequence.  Automated  trimming  was  performed  using  Seqman  (DNAStar), 
followed  by  manual  editing  in  order  to  proof-read  and  further  remove  low  quality 
(ambiguous)  data  and  poly  (A)  tails  from  the  ends  of  the  sequence.  Vector  sequence  was 
removed  using  ContigExpress  (VectorNTI).  Vector  removal  was  then  verified  by 
attempting  to  align  vector  sequence  with  the  edited  cDNA  sequence  in  GenomeBench 
(VectorNTI.)  The  sequences  were  further  edited  by  hand  to  remove  any  trace  vector 
sequence  revealed  in  this  alignment  process. 
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Multiple  sequences  from  the  same  cDNA  clone  were  assembled  into  consensus 
sequences  using  Seqman  (DNAStar).  Clone  consensus  sequences  and  singleton  ESTs 
were  further  assembled  to  group  the  entire  sequence  dataset  into  unique  classes  of 
overlapping  identical  sequences,  referred  to  as  contigs  (Cooke  et  al.,  1997).  A  total  of 
1955  non-redundant  consensus  sequences  were  generated,  using  a  criterion  of  90% 
identity  observed  over  sequences  more  than  50  nucleotides  long.  These  parameters  were 
based  on  a  comparison  of  different  criteria  and  software  packages  that  revealed  that 
Seqman  (DNAStar)  yielded  the  most  consistent  results  using  these  limits,  as 
demonstrated  by  the  ability  to  group  redundant  sequences  together  consistently,  without 
including  non-related  sequences.  The  final  set  of  sequences  will  be  deposited  into  the 
NCBI  dbEST  database. 

Individual  and  consensus  sequences  were  compared  with  known  sequences 
contained  within  the  public  non-redundant  protein  databases  using  the  Basic  Local 
Alignment  Search  Tool  provided  by  the  NCBI  server  (Altschul  et  al.,  1997; 
http://www.ncbi.nlm.nih.gov/BLAST/).  Significant  similarities  were  considered  for 
E-values  less  than  or  equal  to  7E-5.  The  E- value  is  a  parameter  that  describes  the 
number  of  hits  that  would  be  expected  by  chance;  this  value  indicates  the  statistical 
significance  of  a  given  pairwise  alignment.  The  lower  the  E-value,  the  more  significant 
the  hit.  Specific  P.  multiseries  sequences  were  also  searched  against  the  Thalassiosira 
pseudonana  genome  database  at:  http://genome.jgi-psf.org,  and  the  T.  pseudonana  EST 
database  and  the  P.  tricornutumdatabase  at:  http://avesthagen.sznbowler.com/.  In  these 
alignments,  %  identity  and  %  similarity  of  the  coding  sequences  compared  were  reported. 
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Results: 


A  cDNA  library  was  constructed  from  Pseudo-nitzschia  multiseries  cells 
harvested  during  predominantly  toxin-producing  conditions,  from  late  exponential  to 
mid-stationary  growth  phase.  A  total  of  19,200  cDNA  clones  were  individually  selected 
for  growth  and  storage,  after  the  initial  library  plating.  The  range  of  cDNA  insert  size 
was  500  to  4000bp,  averaging  lOOObp.  A  set  of  2552  clones  was  randomly  selected  for 
sequencing.  Of  these,  1320  cDNAs  were  sequenced  in  both  the  3’  and  5’  directions, 
while  1232  cDNAs  were  sequenced  in  either  the  3’  or  5’  direction.  Average  sequence 
length  for  individual  reads  was  675bp,  after  vector  removal  and  end-trimming  (Table 
2-1). 

Assessment  of  library  saturation,  based  on  number  of  clones  within  each  contig 
graphed  against  percent  frequency,  illustrated  that  total  redundancy  was  relatively  low 
(Figure  2-2).  Of  the  2552  cDNAs  analyzed  in  this  study,  sequence  assembly  revealed 
1955  represented  non-redundant  sequences  or  unique  contigs,  indicating  a  redundancy  of 
23.4%.  The  proportion  of  the  P.  multiseries  cDNA  library  that  appears  in  the  sample  of 
reads  in  this  study  may  be  approximately  estimated  by  C  =  1-  n\/n,  where  n\  is  the 
number  of  genes  that  appear  exactly  once  in  the  sampling  and  n  is  the  total  number  of 
clones  sequenced  in  this  study  (Susko  and  Roger,  2004).  The  expected  number  of  reads 
required  to  discover  a  new  gene  may  then  be  roughly  estimated  as  E  =  1/(1  -C).  In  this 
study,  ti\  =  1242  and  n  =  2552.  So,  coverage  in  this  analysis  equals  approximately  0.51, 
and  the  expected  number  of  reads  required  to  discover  a  new  gene  in  this  library  would 
be  2.05.  These  estimates  predict  that  further  sequencing  of  this  library  would  yield  an 
additional  2000  unique  transcripts.  Therefore,  including  rare  or  low-copy  transcripts,  the 
P.  multiseries  cDNA  library  likely  contains  greater  than  4000  expressed  genes  in  total. 

The  P.  multiseries  deduced  amino  acid  sequences  were  searched  against  the 
public  non-redundant  (nr)  protein  database,  assigning  a  significant  E-value  of  less  than  or 
equal  to  7E-5  for  2 1 .0%  of  the  assembled  consensus  sequences  against  known  proteins 
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Figure  2-2.  Assessment  of  mRNA  redundancy  in  the  P.  multiseries  library  by  sequence  assembly  analysis.  The  number  of 
individual  cDNA  clone  sequences  per  contig  is  plotted  against  the  percent  frequency  of  the  independent  contigs. 
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(Table  2-2.)  The  P.  multiseries  cDNA  sequences  that  demonstrated  significant  similarity 
to  known  protein  coding  sequences  were  categorized  into  functional  groups,  shown  in 
Figure  2-3,  while  the  putative  identities  of  the  individual  non-redundant  sequences  that 
demonstrated  significant  similarity  to  known  proteins  are  listed  under  functional  group 
headings  in  Table  2-3. 

In  addition  to  known  proteins,  3.7%  of  the  P.  multiseries  EST  database  showed 
significant  similarity  to  hypothetical  sequences,  and  2.7%  showed  significant  similarity 
to  unknown,  environmental  sequences.  The  unknown,  environmental  sequences  were 
derived  from  a  shotgun  sequencing  study  in  the  nutrient  replete  Sargasso  Sea  (Venter  et 
al.,  2004).  While  this  study  targeted  bacterial  populations  through  size  selection,  the  high 
sequence  similarity  with  P.  multiseries  may  indicate  that  their  samples  included 
eukaryotic  algae,  as  well.  Alternatively,  the  sequence  similarity  may  reflect  the 
evolutionary  history  of  diatoms.  In  addition,  some  P.  multiseries  sequences  with  high 
similarity  to  known  protein  coding  sequences  also  aligned  with  unknown  sequences  from 
the  Sargasso  Sea.  For  example,  one  environmental  sequence  aligned  closely  with  a 
P.  multiseries  sequence  that  also  showed  high  sequence  similarity  to  the  coding  sequence 
for  phosphoenolpyruvate  carboxykinase,  an  enzyme  involved  in  gluconeogenesis, 
anaplerotic  reactions,  and  C4  photosynthesis  (Lea  et  al.,  2001).  Characterization  of  P. 
multiseries  sequences  that  are  similar  to  unknown  Sargasso  Sea  sequences  may  offer 
further  understanding  of  the  role  that  photosynthetic  plankton  play  in  the  unique 
environment  of  the  open  ocean. 

The  P.  multiseries  EST  database  included  highly  significant  matches  for  all  of  the 
major  groups  in  the  universal  phylogenetic  tree  (Figure  2-4).  While  some  hits  against 
distant  species  may  reflect  the  biases  of  the  sequence  databases,  others  likely  reflect  the 
evolutionary  history  of  diatoms.  Diatom  lineages  appear  to  have  arisen  through  a 
secondary  endosymbiosis  between  a  heterotrophic  flagellate  that  engulfed  a  single-celled 
red  alga,  which  itself  traces  back  to  a  primary  endosymbiotic  event  in  which  a 
heterotrophic  protist  engulfed  a  cyanobacterium  (Bhattacharya  et  al.,  2003).  Therefore, 
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Figure  2-3:  Functional  classification  of  derived  coding  sequences  from  Pseudo-nitzschia  multiseries 
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Table  2-3.  Non-redundant  consensus  sequences  from  the  Pseudo-nitzschia  multiseries  cDNA  library 
that  demonstrated  significant  similarity  to  known  proteins  in  NCBI’s  protein  database. 
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PSN466  3  1546  AAM64493.1  Hydroxymethyltransferase  Arabidopsis  thaliana  e-123 

PSN596  2  1 220  AAM64677. 1  Iron-sulfur  cluster  assembly  complex  protein  Arabidopsis  thaliana  7.00E-1 3 

53F5  1  1393  NP__1 9 1 7 1 2. 1  Ketopantoate  hydroxymethyltransferase  Arabidopsis  thaliana  5.00E-56 
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desulfuricans 

PSN918  2  808  T50771  Peptidylprolyl  isomerase  Solatium  tuberosum  8.00E-12 

PSN969  2  840  A  AP82284. 1  Phenylalanine  hydroxylase  Danio  rerio  2.00E-44 

16A3  1  743  BAD07294.1  Prolyl  4-hydroxylase  Nicotiana  tabacum  2.00E-08 
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175A9  1  883  NP  661303.1  Phosphoglycerate  mutase  Chlorobium  tepidum  4.00E-53 

186D10  1  782  NP  952663.1  Phosphoglycerate  mutase  Geobacter  sulfurreducens  3.00E-51 

PSN0100  2  919  NP  869505.1  PPi-phosphofructokinase  Pirellulasp.  1  4.00E-67 

PSN1264  2  833  NP  571625.1  Pyruvate  carboxylase  Danio  rerio  2.00E-31 
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b.  Membrane  and  cell  wall  proteins  (JV) _ 

7A12  1  814  A60610  Circumsporozoite  protein  precursor  Plasmodium  brasilianum  1.00E-24 

PSN0756  2  1136  Q03650  Cysteine-rich,  acidic  integral  membrane  protein  Trypanosoma  brucei  9.00E-19 
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PSN1161  3  1102  P26182  Actin  Achlya  bisexualis  e-117 

186F1  1  834  S49007  Actin  Pythium  irregulare  e-103 

136F3  1  628  P26182  Actin  Achlya  bisexualis  4.00E-67 

47B12  1  274  BAB62395.1  Actin  Nannochloris  coccoides  5.00E-09 
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169G11  1  766  1QGK  Importin  Beta  Homo  sapiens  2.00E-25 

78B2  1  867  AAH54537.1  Kif4  protein,  kinesin  Mus  musculus  6.00E-08 

52C11  1  751  BAC56912.1  Kinesin  motor  protein  Dictyostelium  discoideum  2.00E-26 
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169F9  1  794  AAC78595.1  Hcr2-5B  Lycopersicon  esculentum  4.00E-12 

53H3  1  1 265  AAN 1 7454. 1  Hypersensitive-induced  reaction  protein  4  Hordeum  vulgare  subsp.  1 .00E-5 1 

PSN0089  2  1142  AAF68391.1  Hypersensitive-induced  response  protein,  Zea  mays  8.00E-30 
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167D7  1  780  NP_  07 1909.1  SIL1,  ER  chaperone,  BiP-associated  protein  Homo  sapiens  3.00E-05 

50A2  1  585  NP  996761.1  T-complex  protein  1  delta  subunit  Gallus  gallus  6.00E-57 


Bacteria 


Archaea 


Eucarya 


Universal  Phylogenetic  Tree  modified  from  Woese,  2000.  This  tree  is  derived  from  the  phylogenetic 
comparison  of  rRNA  sequences,  and  indicates  3  major  domains.  Multi-gene  phylogenetic  comparisons 
have  further  modified  the  current  understanding  of  relationships  within  the  Eukarya  (see  figure  2-5). 


Eukarya  (80.8%) 
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Bacteria  (18.7%) 
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Figure  2-4.  The  P.  multiseries  EST  database  included  highly  significant  matches  for  all  of  the  major 
groups  in  the  universal  phylogenetic  tree.  The  number  (A)  and  percentage  (B)  of  P.  multiseries  non- 
redundant  sequences  that  showed  significant  sequence  similarity  to  protein  coding  sequences  from  a 
species  within  the  given  group  are  presented. 
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diatoms  would  be  expected  to  have  genes  deriving  from  both  photosynthetic  and 
heterotrophic  lineages.  The  P.  multiseries  deduced  amino  acid  sequences  aligned  most 
closely  to  eukaryotic  proteins  for  80.8%  of  the  significant  matches,  with  an  almost  even 
split  between  heterotrophic  and  phototrophic  organisms.  Similarity  to  bacterial 
sequences  accounted  for  18.7%  of  the  P.  multiseries  sequences,  of  which  3.6%  were  of 
cyanobacterial  origin  and  8.3%  were  of  proteobacterial  origin.  Proteobacteria  are 
believed  to  be  most  closely  related  to  the  ancestral  bacterial  cell  that  led  to  mitochondria 
in  eukaryotes  (Gray  et  al.,  1999).  Only  2  archaeal  sequences  demonstrated  significant 
similarity  with  the  P.  multiseries  sequences  One  of  these  corresponded  to  a  capsular 
polysaccharide  biosynthesis  protein,  with  an  E-value  of  4E-72, 46%  identity,  and  64% 
similarity.  Searching  the  T.  pseudonana  genome  produced  several  similar  sequences 
with  up  to  74%  identity,  89%  similarity  to  the  P.  multiseries  sequence,  and  42%  identity, 
62%  similarity  to  the  archaeal  sequence  (E-value,  2E-71).  The  P.  tricornutum  EST 
database  did  not  produce  a  similar  sequence.  The  absence  of  this  sequence  from  the 
P.  tricornutum  EST  database  could  be  the  consequence  of  low  expression  levels  of  the 
orthologous  transcript.  In  contrast,  the  P.  multiseries  library  contained  at  least  9  copies 
of  this  transcript,  and  the  T.  pseudonana  genome  appeared  to  contain  four  closely  related 
sequences.  Capsular  polysaccharide  biosynthesis  proteins  are  involved  in  cell  membrane 
biogenesis  and  signaling  (Roberts,  1 996).  Therefore,  this  transcript  may  represent  the 
discovery  of  a  new  protein  family  involved  in  cell  membrane  structure  and  function  in 
diatoms. 

The  P.  multiseries  cultures  used  for  RNA  extraction  were  non-axenic.  However, 
bacterial  RNA  contamination  was  expected  to  be  a  minimal  concern  in  the  P.  multiseries 
EST  database  because  most  bacteria  do  not  synthesize  polyadenylated  RNA  during 
mRNA  transcription.  A  number  of  findings  support  the  view  that  the  presence  of  bacteria 
in  the  P.  multiseries  cultures  did  not  contribute  to  the  content  of  the  cDNA  library. 
Sequence  analysis  of  the  P.  multiseries  cDNAs  did  not  reveal  any  obvious  contamination 
concerns  that  might  have  arisen  from  the  presence  of  a  poly  (A)+  bacterial  contaminant, 
as  discussed  below.  In  addition,  mRNAs  extracted  from  presumably  axenic  cultures  that 
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were  hybridized  to  the  P.  multiseries  cDNA  microarray  reported  in  Chapter  3  of  this 
thesis  further  confirmed  the  assignment  of  these  mRNAs  as  P.  multiseries  transcripts. 

The  library  did  not  reveal  rRNA  fragments  or  other  contaminating  sequences,  validating 
the  overall  quality  of  the  poly  (A)+  RNA  used  to  construct  this  library. 

The  P.  multiseries  deduced  amino  acid  sequences  that  matched  most  closely  with 
bacterial  proteins  were  searched  against  the  T.  pseudonana  and  P.  tricornutum  databases 
to  evaluate  if  these  were  contaminating  sequences  from  bacteria  in  the  cultures  (Table 
2-4).  The  ten  alignments  with  the  highest  E-values  were  chosen  for  this  analysis.  Each  of 
these  sequences  matched  most  closely  with  the  other  diatom  sequences,  supporting  the 
validity  of  the  designation  of  these  transcripts  as  derived  from  P.  multiseries.  Most  of 
these  sequences  matched  most  closely  to  P.  tricornutum,  consistent  with  the  closer 
evolutionary  relationship  between  the  two  pennate  diatoms  (Kooistra  et  al.,  2003;  Damste 
et  al.,  2004). 

An  emerging  model  of  phylogenetic  relationships  among  the  eukarya,  using 
combined  data  from  rRNA,  alpha -tubulin,  beta-tubulin,  actin,  and  elongation  factor- 1 
alpha  (EF-1  alpha)  has  revealed  8  major  groups  (Figure  2-5)(Baldauf  et  al.,  2000, 

Baldauf,  2003).  Multi-gene  datasets  for  taxa  within  these  groups  are  necessary  to 
facilitate  the  resolution  of  the  branches  of  the  eukaryotic  tree  and  to  further  define  the 
root  of  the  tree.  Multiple  copies  of  actin,  beta-tubulin,  and  EF-1  alpha  were  identified  in 
P.  multiseries.  These  genes  and  others,  such  as  the  chaperone,  Hsp70,  should  assist  in 
reconstructing  phylogenetic  relationships  both  within  the  Pseudo-nitzschia  spp.,  and 
within  its  major  group,  Heterokonta.  Lundholm  et  al.  (2002)  suggest  a  paraphyletic 
origin  of  Pseudo-nitzschia  spp.,  based  on  rRNA  and  morphological  data.  Multi-gene 
data  sets  are  more  reliable  than  single  gene  studies,  and  additional  analyses  using  other 
genes  are  necessary  to  validate  this  conclusion  and  further  define  the  relationship  among 
toxin  and  non-toxin-producing  strains  of  Pseudo-nitzschia. 
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As  indicated  on  the  eukaryotic  tree,  among  the  P.  multiseries  sequences  showing 
significant  similarity  to  known  proteins,  several  were  most  similar  to  coding  sequences  in 
other  diatoms.  These  included  a  silicon  transporter,  a  fucoxanthin-chlorophyll  a/c  light¬ 
harvesting  protein,  two  glyceraldehyde-3-phosphate  dehydrogenases,  a  delta-6  fatty  acid 
desaturase,  a  phosphoglycerate  kinase  precursor  protein,  and  two  chaperones,  BiP  and 
Hsp70.  In  addition,  numerous  significant  similarities  were  found  within  the  major 
heterkont  group,  including  a  6-phosophogluconate  dehydrogenase,  an  electron 
flavoprotein  beta  subunit,  a  GTP -binding  protein,  an  S-adenosyl  methionine  synthetase,  a 
histone  H3,  a  number  of  different  actin  sequences,  and  the  chaperones  Hsp70  and  Hsp 
90- 1 .  The  multiple  actin  hits  may  represent  a  multiple-copy  actin  gene  family,  which  has 
been  demonstrated  in  the  oomycetes,  Lagenidium  giganteum  and  Pythium  irregulare 
(Bhattacharya  and  Stickel,  1994.) 

The  functional  classification  of  derived  coding  sequences  from  P.  multiseries 
revealed  that  proteins  involved  in  translation  represented  a  large  proportion  of  the  EST 
database  (12.5%).  The  most  abundant  mRNA  in  the  entire  P.  multiseries  sequence 
assembly  was  EF-1  alpha,  with  73  representative  cDNAs  (Table  2-5).  Other  cDNA 
libraries  have  also  observed  high  representation  of  EF-1  alpha,  such  as  the  L.  digitata 
library  (Crepineau  et  al.,  2000).  EF-1  alpha  modulates  a  diverse  range  of  cellular 
activities,  including  protein  synthesis,  cell  growth,  motility,  protein  turnover,  and  signal 
transduction  (Ridgely  et  ah,  1996).  The  critical  role  of  EF-1  alpha  in  regulating  cellular 
activities  suggests  that  it  is  essential  to  P.  multiseries  biology. 

The  P.  multiseries  deduced  amino  acid  sequence  was  most  similar  to  EF-1 
alpha  in  the  choanoflagellate,  Monosiga  brevicollis,  yielding  an  e- value  of  IE- 1 15,  with 
identity  and  similarity  of  5 1%  and  70%,  respectively.  Searching  this  P.  multiseries 
sequence  against  the  T.  pseudonana  genome  detected  five  possible  EF-1  family  members. 
The  best  hit  produced  an  alignment  of  1303  bp,  with  identity  and  similarity  of  83%  and 
87%,  respectively.  Blast  analysis  of  the  T.  pseudonana  sequence  against  the  nr  database 
also  demonstrated  the  highest  similarity  to  Monosiga  brevicollis,  with  an  E-value  of  1 E- 
120,  identity,  49%,  and  similarity,  67%.  While  the  P.  tricornutum  database  also 
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Table  2-5.  Most  prevalent  mRNAs  as  Measured  by  Redundancy 

PSN  cDNAs  Contig  NCBI 
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♦♦Novel  sequences  did  not  show  any  sequence  similarity  to  Thalassiosira  pseudonana  genome  sequence, 
identity  against  T.  pseudonana. 


appeared  to  include  numerous  elongation  factors,  the  highest  similarity  found  to  the 
putatively  identified  P.  multiseries  EF-1  alpha  was  56%  identity  and  75  %  similarity. 

The  P.  tricornutum  sequence  showed  strong  similarity  to  EF-1  alpha  from  Euglena 
gracilis ,  represented  by  an  E- value  of  IE- 104,  with  84%  identity  and  90%  similarity. 

The  sequence  data  available  from  these  three  diatom  projects  will  allow  further 
examination  of  the  organization  and  expression  of  EF-1  alpha  genes  to  determine  both 
functionality  and  divergence  of  the  EF-1  alpha  genes  within  these  groups. 

The  P.  multiseries  cDNA  library  also  included  four  novel  sequences  that  were 
highly  expressed.  These  sequences  did  not  match  any  sequences  in  the  other  diatom 
databases,  nor  the  public  nr  protein  and  nucleotide  databases.  Therefore,  further 
characterization  of  these  transcripts  may  offer  valuable  insight  into  unique  aspects  of  P. 
multiseries  biology.  Other  highly  redundant  mRNAs  included  three  transcripts  that  were 
up-regulated  during  toxin  production;  these  included  3-carboxymuconate  cyclase, 
phosphoenolpyruvate  carboxykinase,  and  a  long-chain  fatty  acid  CoA  ligase  (discussed 
in  the  next  chapter).  In  addition,  another  highly  expressed  mRNA  showed  high  sequence 
similarity  to  a  subtilisin-type  alkaline  serine  protease.  These  peptidases  may  be  involved 
in  cell  wall  synthesis  or  scavenging  nutrients  from  the  environment,  so  this  transcript 
may  also  reveal  insight  into  either  of  these  important  activities  in  P.  multiseries  biology 
(Miyamoto  et  al.,  2002;  Siezen  and  Leunissen,  1997;  Graycar,  1999). 

Surprisingly,  only  one  P.  multiseries  cDNA  sequence  coded  for  fucoxanthin, 
chlorophyll  a,c-binding  protein  (FCP),  and  2  cDNAs  coded  for  other  light  harvesting 
proteins  (LHP).  The  FCPs  are  major  components  of  the  photosystem  II-associated  light 
harvesting  complex  in  diatoms  and  other  brown  algae  (Bhaya  and  Grossman,  1993).  In 
both  the  L.  digitata  and  P.  tricornutum  EST  databases,  FCPs  and  LHPs  were  multigenic 
and  represented  highly  redundant  mRNAs  (Crepineau  et  al.,  2000;  Scala  et  al.,  2002).  In 
the  public  nr  protein  database,  P.  multiseries  FCP  aligned  most  closely  with  the  diatom 
Skeletonema  costatum  (E-value  2E-50,  63%  identity,  69%  similarity),  which  was  also 
reported  to  contain  multiple  copies  of  this  gene.  Searching  the  P.  tricornutum  EST 
database  revealed  63%  identity  and  76%  similarity  with  one  of  the  P.  tricornutum  FCPs. 
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This  single  member  of  the  FCP  multi-gene  family  was  represented  by  1 8  separate  cDNAs 
in  the  P.  tricornutum  database.  P.  multiseries  array  experiments  in  the  next  chapter 
confirmed  that  P.  multiseries  FCP  was  down-regulated  during  toxin  production.  Leblanc 
et  al.  (1999)  monitored  FCP  expression  in  dark-adapted  cultures  of  the  centric  diatom 
Thalassiosira  weissflogii  and  found  that  mRNA  levels  increased  5-  to  6-  fold  in  response 
to  white  light  irradiation.  In  the  growth  experiments  used  for  the  cDNA  library 
preparation,  cells  were  grown  at  100  pE  m  V,  14:10  h  LD  cycle.  In  the  growth 
experiments  completed  for  the  differential  expression  studies,  cells  were  grown  at  100  pE 
m'2  s'1,  24  L.  Cells  were  harvested  during  the  light  cycle  in  both  experiments.  So,  down- 
regulation  does  not  appear  to  be  induced  by  response  to  changes  in  light  regime  in  the  P. 
multiseries  experiments.  Oeltjens  et  al.  (2004)  showed  that  steady-state  mRNA 
concentrations  of  FCP  in  the  centric  diatom  Cyclotella  cryptica  oscillated  in  a  circadian 
manner.  Again,  the  differences  in  culture  conditions  and  harvesting  during  the  light 
cycle  would  suggest  that  circadian  rhythms  were  not  controlling  FCP  expression  in  P. 
multiseries.  However,  down-regulation  of  FCP  in  P.  multiseries  was  correlated  with 
stationary  growth,  when  photosynthesis  would  presumably  decrease  as  cell  growth  slows 
due  to  some  limiting  factor.  The  pathways  leading  to  chlorophyll  and  DA  production 
may  both  draw  on  a  pool  of  glutamate  (Bates  et  al.,  1998),  therefore,  the  down-regulation 
of  FCP  also  correlates  well  with  the  onset  of  DA  production.  The  P.  multiseries  FCP 
sequence  identified  in  this  study  can  now  be  used  to  probe  for  nuclear-encoded  FCPs  of 
this  gene  family  in  P.  multiseries,  and  to  further  investigate  FCP  regulation  and  control  in 
P.  multiseries. 

The  P.  multiseries  EST  database  also  led  to  the  discovery  of  a  protein  coding 
sequence  demonstrating  high  similarity  to  an  enzyme  involved  in  the  C4  pathway  of 
photosynthetic  carbon  assimilation.  This  transcript  shared  67%  similarity,  and  53% 
identity  (E-value  E-140)  with  a  C4-specific  pyruvate,  orthophosphate  dikinase  (PPDK) 
from  Miscanthus  x  giganteus  (Naidu  et  al.,  2003).  PPDK  is  localized  to  chloroplasts  in 
C4  plants  and  catalyzes  the  conversion  of  pyruvate  to  phosphenolpyruvate.  An  amino- 
terminal  sequence  of  the  C4-PPDK  directs  entry  of  the  precursor  protein  into  chloroplasts 
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(Agarie  et  al.,  1997).  The  separation  of  enzymes  involved  in  C4  and  Calvin  cycles  into 
cellular  compartments  would  allow  C4  photosynthesis  to  occur  in  a  single-celled 
organism,  such  as  P.  multiseries,  without  the  complex  tissue  structure  of  higher  plants. 
Other  key  enzymes  involved  in  C4  photosynthesis  that  were  found  in  the  P.  multiseries 
EST  database  included  phosphoenolpyruvate  carboxykinase,  phosphoenolpyruvate 
carboxylase,  and  pyruvate  carboxylase.  The  existence  of  a  C4  photosynthetic  pathway  in 
diatoms  has  been  debated  (Reinfelder  et  al.,  2000;  Johnston  et  al.,  2001),  and  the 
discovery  of  a  potential  C4-specific  PPDK  in  P.  multiseries  suggests  the  exciting 
possibility  that  a  C4  mechanism  is  active  in  P.  multiseries.  This  discovery  would 
potentially  contribute  to  the  revision  of  current  hypotheses  on  the  evolutionary  history  of 
C4  photosynthesis  and  provide  further  insight  into  the  photosynthetic  activities  and 
ecological  success  of  marine  diatoms.  (This  hypothesis  is  discussed  further  in  chapter  4.) 

Ribulose-l,5-bisphosphate  carboxylase/oxygenase  (rubisco)  is  a  another  principal 
carbon  fixation  enzyme,  which  is  alleged  to  represent  the  most  abundant  enzyme  on  earth 
(Barraclough,  1979;  Smith,  1981),  While  mRNAs  encoding  this  enzyme  have  been 
found  in  abundance  in  plant  EST  databases  (ex.  Hofte  et  al.,  1993),  diatoms  are  known  to 
have  plastid-encoded  rubisco,  which  would  account  for  why  this  gene  was  not  identified 
in  the  P.  multiseries  library  (Hwang  and  Tabita,  1989,  1991)  These  results  are  consistent 
with  those  found  in  the  P.  tricornutum  EST  database. 

The  P.  multiseries  database  included  a  high  number  of  fatty  acid  and  lipid 
molecules,  which  may  be  involved  in  cell  membrane  synthesis,  fuel  for  metabolism,  or 
synthesis  of  the  DA  isoprenoid  side-chain.  In  addition,  many  lipid  molecules  mediate 
signal  transduction.  Enzymes  that  control  production  of  lipid  signaling  molecules  in 
plants  include  phospholipases,  lipid  kinases,  and  phosphatases  (Wang,  2004).  Diatoms 
must  respond  to  constantly  changing  environmental  conditions,  so  signal  transduction 
pathways  are  important  to  their  survival.  In  addition  to  numerous  other  signaling 
molecules  found  in  P.  multiseries,  three  potential  lipid-signaling  enzymes  were 
identified,  including  inositol  5-phosphatase,  and  two  phospholipases.  One  of  these 
enzymes,  phospholipase  A 2,  appears  to  activate  defense  response  in  the  diatom, 
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Thalassiosira  rotula  (Pohnert,  2002).  The  study  of  lipid  signaling  in  plants  is  still  in  its 
early  stages,  so  the  discovery  of  a  number  of  genes  that  are  potentially  involved  in  lipid 
signaling  pathways  may  offer  an  opportunity  to  facilitate  the  advancement  of  our 
understanding  of  these  pathways  in  P.  multiseries  and  other  photosynthetic  organisms. 

Other  P.  multiseries  transcripts  of  interest  included  one  likely  to  encode  a  silicon 
transporter,  SIT.  Silicon  transport  is  essential  to  silica  metabolism,  so  the  identification 
of  SIT  offers  a  useful  tool  to  study  cell  wall  synthesis  in  P.  multiseries  (Hildebrand  et  al., 
1998).  P.  multiseries  cDNAs  with  significant  similarity  to  ferredoxin  and  flavodoxin 
coding  sequences  may  prove  useful  for  exploring  iron  limitation  in  Pseudo-nitzschia  spp. 
(McKay,  1997;  Erdner  et  al.,  1999).  Finally,  P.  multiseries  sequences  that  appear  to 
encode  cell  division  genes,  such  as  cell  division  cycle  27,  may  facilitate  the  development 
of  new  methods  for  measuring  populations  growth  rates  in  Pseudo-nitzschia  spp.  (Lin  et 
al.,  1998, 1999,  2000). 
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Discussion: 


The  EST  database  provides  original  information  on  the  expressed  genome  of 
P.  multiseries,  which  will  help  to  facilitate  further  studies  into  the  physiology,  ecology, 
and  evolutionary  history  of  this  organism.  Comparative  studies  among  the  three  diatoms, 
T.  pseudonana,  P.  tricornutum,  and  P.  multiseries,  will  likely  facilitate  further 
understanding  of  the  intricacies  of  diatom  biology  through  molecular  genomics.  This 
work  represents  an  entry  into  the  study  of  metabolic  pathways  in  P.  multiseries,  and  has 
begun  to  reveal  new  information  about  P.  multiseries  biology.  For  example,  the  presence 
of  novel  sequences  that  did  not  show  sequence  similarity  to  any  of  the  sequences  in  the  T. 
pseudonana  or  P.  tricornutum  databases  suggests  that  this  diatom  contains  divergent 
sequences  that  are  specific  to  the  biology  of  P.  multiseries. 

The  genome  size  of  P.  tricornutumv/as  recently  estimated  to  be  13  Mb  (±  6  Mb) 
(Scala  et  al,  2002).  T.  pseudonana  genome  size  has  been  estimated  to  be  34.3  Mb,  while 
the  number  of  protein  encoding  genes  in  T.  pseudonanahas  been  estimated  to  be 
approximately  1 1,000  protein  genes  (http://genome.jgi-psf.org).  It  is  likely  that  a  genome 
size  for  P.  multiseries  is  in  the  same  range  of  these  diatoms.  The  estimate  of  EST  number 
identified  in  our  study  of  P.  multiseries  (~4,000)  under  the  specific  physiological  states  is 
lower  than  the  1 1,000  described  for  T.  pseudonana.  However,  it  is  reasonable  to  presume 
that  additional  transcripts  will  be  discovered  through  the  additional  characterization  of 
the  current  cDNA  library  as  well  as  the  the  study  of  cDNAs  derived  from  additional 
physiological  states. 

As  the  sequences  that  are  novel  to  P.  multiseries  are  further  characterized,  they 
may  offer  a  useful  tool  for  looking  at  evolutionary  relationships  within  the  Pseudo- 
nitzschia  spp.  Genes  associated  with  toxin  production  will  be  most  useful  for 
understanding  the  relationships  between  toxin-  and  non-toxin-producing  Pseudo- 
nitzschia  spp.,  and  for  monitoring  toxin  production  in  the  field.  Many  possible  candidate 
genes  that  may  play  a  role  in  DA  biosynthesis  were  revealed  in  the  EST  sequencing 
project.  Examples  include  genes  likely  to  encode  enzymes  involved  in  isoprenoid. 
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pyruvate,  or  glutamate  metabolism,  such  as  delta  6  fatty  acid  desaturase,  phosphoenol- 
pyruvate  carboxykinase,  glutamate  dehydrogenase,  and  5-oxo-L-prolinase.  cDNA  array 
experiments  were  designed  in  the  next  chapter  to  select  for  genes  that  were  specifically 
correlated  with  toxin  production;  this  dataset  offers  useful  target  genes  for  further 
characterization,  which  should  lead  to  a  better  understanding  of  P.  multiseries  biology 
and  DA  biosynthesis,  both  in  the  lab  and  field. 

The  EST  study  contributes  41 1  newly  identified  coding  sequences  from  Pseudo- 
nitzschia  multiseries.  This  data  can  now  be  used  to  identify  nuclear-encoded  genes  from 
P.  multiseries  or  other  related  diatoms  and  to  further  characterize  the  role  of  specific 
genes  in  P.  multiseries  biology. 
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Chapter  III 


Gene  Expression  Profiling 


68 


Abstract: 


A  cDNA  microarray  was  designed  to  screen  for  differentially  expressed  genes 
under  toxin-producing  vs.  non-toxin-producing  conditions  in  Pseudo-nitzschia 
multiseries,  in  order  to  begin  to  understand  the  biochemical  pathways  and  physiological 
control  mechanisms  which  relate  to  toxin  production  in  this  organism.  Expression 
analysis  of  5,372  cDNAs  revealed  121  up-regulated  cDNAs,  representing  12  unique 
transcripts,  and  51  down-regulated  cDNAs,  representing  15  unique  transcripts.  Up- 
regulated  transcripts  encoded  protein  sequences  with  structural  similarity  to  a  3- 
carboxymuconate  cyclase,  phosphoenolpyruvate  carboxykinase,  an  amino  acid 
transporter,  a  small  heat  shock  protein,  a  long-chain  fatty-acid-CoA  ligase,  and  an 
aldo/keto  reductase.  Down-regulated  transcripts  included  sequences  with  similarity  to  a 
key  regulatory  enzyme  involved  in  glycolysis,  Ppi-phosphofructokinase,  and  a  light 
harvesting  protein,  fucoxanthin-chlorophyll  a/c  light  harvesting  protein.  These  results 
provide  a  framework  for  investigating  the  control  of  toxin  production  in  P.  multiseries. 
These  transcripts  may  also  be  useful  in  ecological  field  studies  in  which  they  may  serve 
as  signatures  of  toxin  production. 
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Introduction: 


Domoic  acid  (DA)  is  a  phycotoxin  produced  by  a  group  of  marine  algae  limited  to 
certain  species  of  the  diatom  genera  Pseudo-nitzschia,  Nitzschia,  and  Amphora,  and  the 
macro  red  algae  Chondria,  Alsidium,  Amansia,  Digenea,  and  Vidalia  (Takemoto  and 
Daigo,  1958;  Wright  et  al.,  1989;  Bates  et  al.,  1998;  Bates,  2000).  Accumulation  of  DA 
by  filter  feeding  of  Pseudo-nitzschia  cells  and  subsequent  transmission  of  the  neurotoxin 
to  humans  via  shellfish  has  resulted  in  severe  illness,  designated  amnesic  shellfish 
poisoning  (ASP)  due  to  symptoms  characterized  by  memory  loss  (Bates  et  al.,  1989, 
Bates,  1998,  Wright  et  al.,  1989).  DA  is  a  neuroexcitatory  amino  acid  that  exhibits 
structural  similarity  with  glutamic  acid,  kainic  acid,  and  proline  (Figure  1-1).  DA  binds 
to  glutamic  acid  receptors  with  an  affinity  up  to  100  times  that  of  glutamate,  leading  to 
prolonged  depolarization  and  ultimately  swelling  and  cell  death  in  neurons  exposed  to 
this  water  soluble  amino  acid  (Stewart  et  al.,  1990;  Olney,  1994).  Efforts  to  discover  the 
environmental  factors  that  stimulate  DA  production  by  Pseudo-nitzschia  spp.  have  led  to 
a  greater  understanding  of  the  physiology  and  ecology  of  these  organisms,  yet  the 
characterization  of  the  biosynthetic  pathways  leading  to  DAsynthesis  has  been  minimal, 
limited  to  two  l3C-  and  l4C-labelling  studies  (Douglas  et  al.,  1992;  Ramsey  et  al.,  1998) 
and  more  recently  to  a  computational  modeling  approach  (Smith  et  al.,  2001;  Smith, 
personal  communication). 

The  carbon  labeling  experiments  supported  condensation  of  an  activated 
glutamate  derivative  from  the  citric  acid  cycle  with  an  isoprenoid  chain,  such  as  geranyl 
pyrophosphate,  and  subsequent  cyclization  as  a  possible  mechanism  for  DA  biosynthesis 
(Figure  1-2).  On  the  other  hand.  Smith  and  colleagues  have  focused  on  the  relationship 
of  proline  to  DA  metabolism,  by  modeling  and  measuring  amino  acid  levels  to  show  that 
proline  and  DA  are  inversely  correlated,  therefore,  suggesting  that  either  1)  proline  is  an 
upstream  precursor  to  DA,  or  2)  DA  substitutes  for  the  physiological  function  of  proline. 
Smith  goes  on  to  suggest  a  biochemical  model  describing  the  hypothesized  derivation  of 
3-hydroxy-glutamate  from  proline  metabolism,  leading  to  DAsynthesis  (Figure  1-3). 
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The  two  proposed  models  for  DA  synthesis  are  linked  by  3-hydroxy-glutamate,  which  the 
former  proposal  extends  to  suggest  condensation  of  this  glutamate  derivative  with  an 
isoprenoid  chain  and  subsequent  cyclization  to  form  the  pyrrolidine  ring  of  DA.  Both  of 
these  proposed  pathways  suggest  many  potential  metabolic  schemes,  and  further 
understanding  of  DA  biosynthesis  would  be  limited  without  an  investigation  into  the 
genes  that  govern  the  regulation  of  these  pathways.  Therefore,  the  goal  of  this  study  was 
to  identify  genes  that  are  up-regulated  during  toxin  production  in  an  effort  to  advance  our 
understanding  of  DA  biosynthesis  and  regulation,  and  to  provide  further  insight  into  the 
overall  physiology  of  P.  multiseries. 

DA  production  has  been  shown  to  begin  during  the  late  exponential  growth  phase 
and  peak  during  the  stationary  phase,  when  division  of  the  entire  population  of  cells 
slows  due  to  Si  or  P  limitation  (Bates,  1998).  This  study  applied  cDNA  microarray 
technology  to  investigate  gene  expression  in  P.  multiseries  during  high-toxin-producing 
vs.  low-toxin-producing  conditions  by  comparing  mRNAs  from  cells  that  were  in 
exponential  phase  to  cells  that  were  in  stationary  phase.  Comparative  analysis  of  cells 
harvested  over  the  growth  cycle  of  P.  multiseries  would  likely  select  for  genes  associated 
with  DA  biosynthesis,  transport,  ceil  cycle  progression,  cell  signaling,  reproduction,  and 
stress  response.  The  construction  of  the  P.  multiseries  cDNA  library  (chapter  II) 
facilitated  the  manufacture  of  P.  multiseries  cDNA  microarrays  to  screen  the  library  for 
genes  or  clusters  of  genes  that  were  up-regulated  during  toxin  production.  Analysis  of 
this  large  set  of  expression  data  has  revealed  several  candidate  genes  that  may  be 
involved  in  DA  biosynthesis,  stress  response,  and  carbohydrate  metabolism. 
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Materials  and  Methods: 


The  technical  and  software  options  in  microarray  analysis  are  vast  and  no  single 
protocol  is  available  for  individual  cDNA  array  projects;  therefore,  protocols  must  be 
optimized  for  each  individual  application  of  this  technology.  Alternative  protocols  were 
evaluated  throughout  each  step  of  the  cDNA  microarray  approach  to  expression  profiling 
in  Pseudo-nitzschia  multiseries,  from  array  construction  through  data  analysis.  The  final 
protocols  used  in  the  P.  multiseries  project  are  presented  in  the  following  sections,  with 
notes  on  alternatives,  when  useful  or  informative. 

Growth  Experiments:  Pseudo-nitzschia  multiseries  strains  used  in  this  study  were 
graciously  provided  by  Stephen  S.  Bates  (Department  of  Fisheries  and  Oceans,  Gulf 
Fisheries  Center,  Moncton,  NB,  Canada.)  The  strains  included  CLN-125,  CLN-125  - 
Axenic,  and  CLN-191.  P.  multiseries  cells  were  grown  in  0.2pm  filtered  seawater 
enriched  with  f/2  nutrients  (Guillard,  1975).  Initial  inoculum  was  acclimated  to 
experimental  culture  conditions,  and  cells  were  in  exponential  growth  phase.  Batch 
cultures  were  maintained  at  20°C,  lOOpEm'V1, 24  h  Light.  Fifteen  L  of  culture  were 
grown  in  1 9-L  borosilicate  carboys;  cultures  were  aerated  using  an  aquarium  pump  and 
sterile  tubing  and  the  cultures  were  constantly  mixed  with  magnetic  stirrers. 

Samples  were  taken  every  two  to  three  days  for  cell  counts,  DA  analysis,  and 
nutrient  analysis.  Cell  concentrations  were  estimated  by  averaging  the  number  of  cells 
enumerated  by  light  microscopy  using  a  Neubauer  hemacytometer  chamber  in  three 
separate  counts  of  individual  samples  preserved  in  Lugol’s  iodine.  DA  concentrations 
were  analyzed  in  whole  culture  samples  (cells  plus  medium)  by  Claude  Leger  in  Stephen 
S.  Bates’  laboratory  using  a  FMOC  derivatization  method  (Bates  et  al.,  1989, 

Pocklington  et  al.,  1990). 

From  the  original  15  L  of  P.  multiseries  culture  grown  per  carboy,  eight  L  of 
culture  were  harvested  at  an  initial  time  point  during  mid-  to  late  exponential  growth 
(Harvest  1 ).  The  remaining  seven  L  of  culture  were  harvested  at  a  final  time  point  during 
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Table  3-1:  DA  Concentrations  for  Pseudo-nitzschia  multiseries  Growth  Experiments:  CLN-125 
Axenic,  Experiments  1-4;  CLN-125,  Experiments  A-D;  CLN-191,  Experiments  A-D. 
Experiments  in  red  represent  those  used  for  microarray  studies. 

EXPERIMENT  -  Harvest  DOMOIC  ACID  (ng/ml) 

CLN  125  Axenic  #1  -Day  9,  Harvest  1  0 

CLN  125  Axenic  #1  -Day  42,  Harvest  2  38 

•  • 

CLN  125  Axenic  #2  -Day  9,  Harvest  1  12 

CLN  125  Axenic  #2  -Day  42,  Harvest  2  55 

i  • 

CLN  125  Axenic  #3  -Day  10,  Harvest  1  16 

CLN  125  Axenic  #3  -Day  30,  Harvest  2  73 

•  • 

CLN  125  Axenic  #4  -Day  10,  Harvest  1  12 

CLN  125  Axenic  #4  -Day  30,  Harvest  2  58 

i  • 

CLN  125 A  -Day  7,  Harvest  1  18 

CLN  125 A  -Day  9,  Harvest  2  123 

CLN  125 A  -Day  31,  Harvest  3  1878 

•  • 

CLN  125B  -Day  7,  Harvest  1  24 

CLN  125B  -Day  9,  Harvest  2  139 

CLN  125B  -Day  31,  Harvest  3  21 12 

i  • 

CLN  125C  -Day  4,  Harvest  1  2 

CLN  125C  -Day  10,  Harvest  2  267 

•  « 

CLN  125D  -Day  4,  Harvest  1  0 

CLN  125D  -Day  10,  Harvest  2  314 

i  • 

CLN  191A  -Day  7,  Harvest  1  547 

CLN  191A -Day  31,  Harvest  2  10031 

i  • 

CLN  191B  -Day  7,  Harvest  1  617 

CLN  191B  -Day  31.  Harvest  2  7768 

CLN  191C  -Day  2,  Harvest  1  44 

CLN  191C -Day  8,  Harvest  2  1218 

i  • 

CLN  191D  -Day  2,  Harvest  1  47 

CLN  191D  -Day  8,  Harvest  2  1319 


73 


stationary  growth  (Harvest  2).  The  cell  suspension  was  spun  in  0.5  L  bottles  for  1 5 
minutes  at  lOOOg.  The  resultant  pellets  were  pooled,  split  among  2-4,  50ml  conical  tubes 
and  spun  briefly  to  remove  any  remaining  liquid.  Ten  to  20  mL  of  Trizol  were  added  to 
the  conical  tubes,  and  the  pellets  were  homogenized  for  60  seconds  at  full  speed 
(Polytron),  frozen  in  liquid  N,  and  stored  at  -80°C  for  later  RNA  extraction.  The  goal  was 
to  harvest  cells  during  high  (stationary  growth)  vs.  low  (exponential  growth)  DA 
producing  conditions.  DA  analysis  revealed  that  three  out  of  twelve  growth  experiments 
had  undetectable  or  minimal  DAconcenirations  at  the  initial  harvest  and  relatively  high 
DAconcentrations  at  the  final  harvest  (Table  3-1).  Therefore,  these  three  experiments 
were  selected  for  further  analysis  using  the  P.  multiseries  cDNA  microarrays.  The  three 
growth  experiments  were  designated  125C,  125D,  and  AX1,  referring  to  CLN-125  (non- 
axenic),  growth  experiments  C  and  JD,  and  CLN-125  (Axenic),  growth  experiment  1, 
respectively. 

Construction  ofP.  multiseries  cDNA  microarray:  A  total  of  5372  clones  from  the 
P.  multiseries  cDNA  library  were  grown  overnight  in  Luria  broth  with  carbenicillin  (50 
pg  /ml),  at  37°C  on  a  shaker  table.  A  volume  of  10  pL  of  bacterial  culture  was  then  used 
as  template  in  100  pL  PCR  reactions  with  primers  T7  forward  (TAATACGACTCACTA 
TAGGG)  and  M13  reverse  (CAGGAAACAGCT  ATGAC),  which  flank  the  cloning  site 
of  the  pMDl  (a  pUC18-derived)  vector  (see  library  construction,  chapter  2).  PCR 
conditions  were  optimized  to  include  the  following  reagents:  IX  PCR  buffer  (Invitrogen), 
200pM  each  dNTP,  2pM  each  primer,  2mM  MgS04,  and  2.5U  Invitrogen  HiFI  Taq 
polymerase.  An  initial  DNA  denaturation  step  at  94°C  for  2  minutes  was  followed  by  35 
amplification  cycles  (0:30  melting  at  94°C,  0:30  annealing  at  55°C,  1:00  extension  at 
68°C).  Samples  of  the  bacterial  clones  used  in  PCR  preparation  were  placed  at  -80°C  in 
15-30%  glycerol,  as  back-ups  of  the  original  library  clones. 
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PCR  products  were  purified  using  Millipore  MultiScreen  size-exclusion  filter 
plates.  Vacuum  pressure  (approximately  10  inches  Hg)  was  applied  for  20  min  or  until 
wells  were  empty,  to  remove  primers,  dNTPS,  and  salts,  while  retaining  the  amplified 
DNA  on  the  filter.  A  wash  step  included  the  addition  of  50  to  100  pL  of  nuclease- free 
de-ionized  water  to  each  well;  DNA  was  resuspended  and  mixed  by  repetitive  pipetting, 
and  the  vacuum  was  re-applied.  The  DNA  was  then  resuspended  in  100  pL  nuclease- 
free  de-ionized  water  and  transferred  to  clean  plates  using  a  mechanical  pipetting 
station.  The  DNA  was  split  into  two  aliquots;  one  for  array  printing  and  one  for  quality 
control  and  sequencing.  DNA  quality  was  verified  by  1%  agarose  gel  electrophoresis. 
DNA  concentration  was  determined  by  PicoGreen  fluorescent  staining  (Ahn  et  al., 

1996).  A  limited  number  of  samples  were  also  quantified  by  measuring  absorbency  at 
260/280nm  to  verify  PicoGreen  results.  The  final  DNA  concentrations  averaged 
approximately  120  ng/pL.  PCR  product  for  printing  was  dried  by  vacuum 
centrifugation  and  resuspended  in  10  pL  of  1.5M  Betain  /3X  SSC  print  buffer,  yielding 
a  final  concentration  of  600  ng/pL,  on  average. 

P.  multiseries  cDNA  probes  were  printed  onto  CMT-GAPS  slides  (Coming) 
using  a  Biorobotics  MicroGrid  610  TAS  Arrayer  with  quill  pins.  5372  P.  multiseries 
cDNAs  were  printed  in  duplicate;  in  addition,  10  control  cDNAs  from  SpotReport  Alien 
Array  Validation  System  were  printed  in  duplicate,  resulting  in  a  final  chip  including 
10772  features.  Spots  were  printed  with  a  32  print-tip  head,  producing  a  lay-out 
represented  by  8  x  4  grids  (Figure  3-1).  Each  grid  was  sub-divided  into  two  sections, 
representing  replicate  spots  (Figure  3-2).  Individual  features  were  13  pm  in  diameter  and 
were  separated  by  130  pm  (from  one  spot  to  the  next.)  Approximately  0.005  pi  of  600 
ng/pL  DNA  (2-3ng)  was  transferred  to  each  spot.  Final  P.  multiseries  arrays  displayed 
strong  signal  to  noise  ratio,  with  virtually  no  background,  as  demonstrated  visually 
(Figures  3-1  and  3-2).  Results  also  illustrate  the  high  degree  of  reproducibility  between 
replicate  spots  on  the  P.  multiseries  chip. 
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Figure  3-2:  A  representative  grid  from  the  Final  P.  multiseries  cDNA  microarray  enlarged  to  demonstrate  the  high  degree  of 
reproducibility  between  replicate  spots  on  the  P.  multiseries  chip 
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Test  print:  A  limited  number  of  arrays  were  printed  with  960  cDNA  probes  generated 
from  the  original  P.  multiseries  cDNA  library.  The  test  print  led  to  optimal  protocols 
employed  in  the  final  print,  presented  above.  During  preparation  of  the  test  print,  filter 
purification  of  PCR  product  using  Millipore  MultiScreen  -PCR  plates  was  compared  to 
Qiaquick  gel  purification.  Quality  control  of  the  resultant  product  by  gel  electrophoresis, 
PicoGreen  quantification,  and  sequencing  of  resultant  product  showed  that  the  methods 
were  relatively  comparable  in  quality,  but  filter  purification  resulted  in  much  higher 
efficiencies  of  product  recovery,  by  at  least  a  factor  of  10.  A  comparison  of  sequence 
length  and  quality  in  96  samples  yielded  average  reads  of  616  bp  for  the  Millipore 
filtered  PCR  product  versus  513  bp  for  the  Qiagen  cleaned  product.  In  addition,  the  filter 
screens  were  more  time,  labor,  and  cost  efficient.  Therefore,  Millipore  Multiscreens 
were  adopted  for  subsequent  purification  of  PCR  product.  Other  trials  using  the  test  chip 
also  allowed  poly  (A)+  hybridization  to  be  compared  to  total  RNA  hybridization  against 
the  probe  cDNAs  on  the  microarray.  Results  illustrated  that  total  RNA  could  be  used 
successfully,  without  loss  of  signal  or  increased  background.  Finally,  the  total  amount  of 
RNA  needed  for  target  cDNA  preparation  was  investigated,  which  led  to  a  final  protocol 
requiring  10pg  of  RNA  per  labeling  reaction.  This  quantity  of  RNA  was  five-fold  less 
than  the  original  protocol  required,  which  was  helpful  in  experimental  design  and 
execution. 

RNA  Preparation  and  Microarray  Hybridizations:  Total  RNA  was  extracted  from 
P.  multiseries  cells  harvested  from  growth  Experiments  125C,  125D,  and  AX1  during 
high  vs.  low  DA  producing  conditions  (following  RNA  extraction  procedure  described  in 
chapter  2.)  Total  P.  multiseries  RNA  was  cleaned  with  Qiagen  RNeasy  columns  and  run 
on  formaldehyde  agarose  gels  for  quality  control.  (Gels  were  transferred  onto  Hybond 
membrane  and  stored  in  air-tight  plastic  bags  at  -20  °C  for  future  analysis.)  Ten 
micrograms  P.  multiseries  RNA  was  spiked  with  mRNA  from  the  Spot  Alien  Validation 
System,  incubated  for  10  minutes  at  65°C  with  oligo-dT  and  then  cooled  at  25°C.  Four 
pLs  of  ImM  Cy3-  or  Cy5-  conjugated  dUTPs  were  added  and  the  mixture  was  incubated 
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at  42°C  for  two  minutes.  A  master  mix  including  4.5 jj.1  0.2M  DTT,  18ul  5X  1st  strand 
buffer,  1.8ul  25mM  dATP,  dGTP,  and  dCTP,  and  1.8ul  lOmM  dTTP,  and  2pl  of 
Superscript  II  reverse  transcriptase  was  added  to  the  RNA  mixture  and  incubated  for  1 
hour  at  42°C.  After  one  hour,  an  additional  1  pi  of  Superscript  II  was  added  and  the 
reaction  was  incubated  at  42  °C  for  another  hour.  Starting  RNA  was  degraded  by 
addition  of  stop  solution  (3pl  0.5M  EDTA,  pH  8;  3pl  IN  NaOH)  and  incubated  for  30 
min.  at  60  °C.  Labeled  cDNA  was  cleaned  up  using  Qiagen  columns;  Cy3  labeled  cDNA 
and  the  corresponding  Cy5  labeied  cDNA  that  were  to  be  compared  were  combined  and 
loaded  onto  the  same  column.  The  labeled  target  cDNA  pools  were  then  hybridized  to 
the  probe  cDNAs  on  P.  multiseries  microarrays. 

Arrays  were  processed  before  hybridization  as  follows:  the  slides  were  humidified 
by  holding  them  face-down  over  a  steaming  water  bath  for  a  few  seconds,  then  snap- 
dried  on  a  95°C  heat  block.  The  DNA  was  immobilized  onto  the  slides  by  UV  cross- 
linking  at  65mJoules.  Cross-linked  slides  were  soaked  for  15  minutes  in  freshly  prepared 
succinic  anhydride/sodium  borate  solution  with  gentle  agitation,  soaked  for  2  minutes  in 
boiling  nuclease  free,  de-ionized  water  and  finally,  rinsed  in  95%  ethanol  and  spun  dry. 
Arrays  were  stored  in  a  room  temperature  dessication  chamber  until  hybridization. 

Processed  microarrays  were  pre-hybridized  at  room  temperature  for  1  hour.  Pre¬ 
hybridization  solution  was  composed  of  50%  formamide,  5X  SSC,  0.1%  SDS,  1%  BSA, 
while  hybridization  buffer  was  composed  of  50%  formamide,  10X  SSC,  0.2%  SDS, 
0.26%  salmon  sperm.  Labeled  cDNA  was  denatured  prior  to  hybridization  by  heating  for 
2  minutes  at  80°C,  while  the  cassette  and  microarray  were  pre-warmed  at  42°C.  The 
cDNA  was  then  loaded  onto  the  array,  and  arrays  were  hybridized  for  16  hours  at  42°C  in 
humidified  chambers.  The  slides  were  then  washed  successively  in  IX  SSC,  0.03%  SDS; 
0.1X  SSC,  0.01%  SDS;  and  0.1X  SSC.  Finally,  the  slides  were  dried  by  a  brief 
centrifugation. 

Experiments  were  dye-swapped  to  account  for  differences  in  dye  labeling  and 
detection  efficiencies,  for  example,  due  to  faster  bleaching  of  Cy5  than  Cy3.  So,  for  each 
gene  expression  comparison,  two  hybridizations  were  completed  with  labeling  of  RNAs 
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being  exchanged  between  the  two  dye-swap  experiments.  Technical  replicates  were  also 
repeated  within  each  experiment;  for  example,  if  2  replicates  were  run,  and  two  dye-swap 
experiments  were  carried  out  for  each  replicate,  then  there  would  be  a  total  of  4  replicates 
to  examine.  Experiments  125C  and  125D  included  a  total  of  6  replicate  experiments, 
while  AX1  included  a  total  of  4  replicates. 

Image  analysis:  Arrays  were  scanned  at  595nm  (Cy3)  and  685nm  (Cy5)  on  ArrayWoRx 
scanners  (Applied  Precision,  Inc.)  The  ArrayWoRx  scanning  system  converts  signal 
from  fluors  to  “pixel”  values  which  allows  the  data  to  be  saved  as  tiff  files. 
MolecularWare  DigitalGenome  software  was  then  used  to  integrate  annotated  chip 
information  with  the  tiff  files  and  to  visualize,  edit  (ex.  flagging  spots  covered  by  dust 
particles,  missing  spots,  spots  with  low  intensity,  etc.  for  deletion),  and  export  the  data 
for  further  analysis.  Data  was  exported  into  Microsoft  Excel  and  sorted  by  Cy3  and  Cy5 
intensities  to  remove  any  data  that  was  below  an  intensity  level  of  50  in  both  channels; 
the  data  was  then  normalized  and  analyzed  for  statistical  significance. 

Data  normalization :  Many  sources  of  systematic  variation  may  exist  in  microarray 
experiments  that  must  be  accounted  for  before  expression  levels  can  be  compared 
appropriately.  In  this  study,  loess  normalization  was  used  to  correct  for  differences  in 
dye  labeling  and  detection  efficiencies,  and  other  systematic  biases  in  the  measured 
expression  levels  both  within  and  across  arrays  (Quackenbush,  2002;  Park  et  al.,  2003). 
The  loess  method  of  normalization  scales  individual  intensities  by  fitting  a  curve  to  the 
data  using  a  locally  weighted  non-linear  regression,  where  M  =  log2(Cy5/Cy3)  for  each 
element  on  the  array  is  plotted  as  a  function  of  A  =  logio(Cy5*Cy3)  product  intensities. 

In  these  experiments,  a  loess  algorithm  was  applied  within  and  across  each  dataset  using 
Insightful  S+Array Analyzer  software  (Figures  3-3  to  3-8).  MvA  and  box  plots  for  each  of 
the  P.  multiseries  growth  experiments  illustrate  the  normalization  of  data  across  the 
replicate  arrays.  Especially  notable  is  the  correction  of  Cy3  vs.  Cy5  intensity 
differentials,  illustrated  by  the  fitting  of  the  log2(Cy5/Cy3)  ratios  to  the  average  in  each 
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of  the  box  plots.  (Other  normalization  parameters  and  options  were  considered,  however, 
loess  normalization  yielded  the  most  robust  results,  without  washing  out  expression 
signals.) 

Quality  control  included  analyzing  Cy3/Cy5  ratios  for  the  control  set  of  data  after 
normalization.  The  normalized  intensity  data  for  each  control  spot  was  analyzed  using 
linear  regression  analysis  to  verify  that  the  total  integrated  intensity  across  the  control 
spots  was  equal  for  both  channels  (slope  =  1).  The  slope  of  the  Cy3  to  Cy5  linear 
regression  approached  1  for  all  three  experiments;  AX1  slope  was  0.98,  while  125C  was 
1.17,  and  125D  was  0.96  (Figures  3-9  to  3-11).  The  variability  around  the  slope  of  the 
Cy3/Cy5  ratio  was  especially  small  for  Experiment  AX1,  and  relatively  small  for 
Experiments  125C  and  125D.  Cy3/Cy5  ratios  calculated  individually  for  each  feature  in 
the  whole  dataset  averaged  0.93  ±  0.09  in  AX1, 0.98  ±0.17  in  125C,  and  1.11  ±0.18  in 
125D.  In  general,  the  standard  deviation  among  replicate  features  in  AX1  was  less  than 
in  125C  and  125D.  Therefore,  in  the  statistical  analysis  that  follows,  more  genes  were 
called  statistically  significant  in  this  experiment  than  in  the  other  two  experiments. 

Statistical  Analysis:  Significance  analysis  of  gene  expression  ratios  was  performed  using 
a  t-test  algorithm  modified  for  microarray  analysis  (Tusher  et  al.,  2001).  This  method. 
Significance  Analysis  of  Microarrays  (SAM),  identifies  genes  with  statistically 
significant  changes  in  expression  by  assimilating  a  set  of  gene-specific  t-tests.  SAM 
assigns  a  score  to  each  gene  on  the  basis  of  change  in  gene  expression  relative  to  the 
standard  deviation  of  repeated  measurements.  A  scatter  plot  of  the  observed  relative 
difference  d(i)  vs.  the  expected  relative  difference  dE(i)  is  used  to  identify  significant 
changes  in  gene  expression.  For  the  majority  of  genes,  d(i)  approximates  dE(i),  but  some 
genes  are  represented  by  points  displaced  from  the  d(i)  =  dE(i)  line  by  a  distance  greater 
than  a  designated  threshold,  delta.  Genes  that  fall  outside  the  cutoff  represented  by  delta 
are  considered  significant  (figure  3-12  to  3-14).  SAM  generates  a  test  statistic  "q",  which 
is  similar  to  a  p-value,  but  adapted  to  the  analysis  of  a  large  number  of  genes.  The  q- 
value  measures  the  significance  of  the  expression  ratio  of  a  gene  by  reporting  the  lowest 
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Figure  3-3:  Normalization  of  AX1  array  data  (Replicates  represent  dye  swap  experiments): 

MvA  plot  illustrates  the  normalization  of  data  across  the  replicate  arrays.  M  =  log2(Cy5/Cy3).  A  =  logio(Cy5*Cy3). 
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Figure  3-4:  Normalization  of  AX1  array  data  (Replicates  represent  dye  swap  experiments): 

Box  plot  illustrates  the  normalization  of  data  across  the  replicate  arrays.  M  =  log2(Cy5/Cy3). 
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Figure  3-6:  Normalization  of  125C  array  data  (Replicates  represent  dye  swap  experiments): 

Box  plot  illustrate  the  normalization  of  data  across  the  replicate  arrays.  M  =  log2(Cy5/Cy3). 
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Figure  3-7:  Normalization  of  125D  array  data  (Replicates  represent  dye  swap  experiments): 

MvA  plot  illustrates  the  normalization  of  data  across  the  replicate  arrays.  M  =  log2(Cy5/Cy3).  A  =  logio(Cy5*Cy3). 
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Figure  3-9:  Linear  regression  analysis  of  control  data  spots  -  Experiment  AX1. 
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Figure  3-10:  Linear  regression  analysis  of  control  data  spots  -  Experiment  125C. 
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Figure  3-11:  Linear  regression  analysis  of  control  data  spots  -  Experiment  125D. 
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Figure  3-13:  125C  SAM  plot  (Statistical  Analysis  of  Microarrays)  Scatter  plots  of  the  observed  relative  difference  d(i)  vs. 
the  expected  relative  difference  de(i).  For  the  majority  of  genes,  d(i)  approximates  de(i),  but  some  genes  are  represented  by 
points  displaced  from  the  d(i)  =  deO)  line  by  a  distance  greater  than  a  designated  threshold,  delta  (represented  by  the  dotted 
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Figure  3-14:  125D  SAM  plots  (Statistical  Analysis  of  Microarrays)  Scatter  plots  of  the  observed  relative  difference  d(i)  vs. 
the  expected  relative  difference  de(i).  For  the  majority  of  genes,  d(i)  approximates  de(i),  but  some  genes  are  represented  by 
points  displaced  from  the  d(i)  =  dg(i)  line  by  a  distance  greater  than  a  designated  threshold,  delta  (represented  by  the  dotted 
line).  Genes  that  fall  outside  the  cutoff  represented  by  delta  are  considered  significant 
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false  discovery  rate  at  which  the  gene  is  called  significant.  The  false  discovery  rate 
(FDR)  is  an  estimate  of  the  percentage  of  genes  identified  by  chance  that  is  based  on 
analyzing  permutations  of  the  repeated  measurements  of  expression  for  each  gene.  By 
varying  delta,  the  false  discovery  rate  is  adjusted.  In  addition,  a  second  approach  to 
identifying  significant  gene  expression  changes  is  to  account  for  consistent  changes 
between  paired  samples  at  a  certain  fold-change  cut-off.  In  the  present  study,  delta  value 
and  fold-change  cut-offs  were  selected  to  keep  the  FDR  below  2.5%,  which  means  that 
for  every  100  genes  called  significant,  fewer  than  2.5  genes  would  be  identified 
incorrectly. 

Initially,  each  dataset  corresponding  to  the  separate  growth  experiments  (125C, 
125D,  and  AX1)  was  analyzed  independently.  The  fold-change  differences  in  the  non- 
axenic  growth  experiments  were  consistently  higher  than  the  fold-change  differences  in 
the  axenic  growth  experiment.  For  example,  the  cDNAs  with  positive  fold-change 
differences  averaged  4.07  ±  0.97  in  growth  Experiment  125D,  3.85  ±  1.17  in  growth 
Experiment  125C,  and  1.92  ±  0.54  in  the  axenic  growth  Experiment  AX1.  Features  with 
a  fold-change  difference  of  at  least  2.5  or  less  than  0.5  were  considered  differentially 
expressed  for  Experiments  125C  and  125D,  while  features  of  at  least  1.25  or  less  than  0.8 
were  considered  differentially  expressed  for  Experiment  AX1.  By  raising  the  delta  value, 
lower  fold-change  differences  could  be  tested  for  statistical  significance,  which  was 
useful  in  further  analysis  below. 

Up-regulated  and  down-regulated  cDNAs  were  compared  across  the  three 
experiments  and  only  those  transcripts  that  were  up-regulated  in  all  three  experiment 
were  analyzed  further.  First,  individual  cDNAs  were  identified  as  up-regulated  across  all 
three  experiments,  next  the  cDNAs  were  annotated  based  on  sequence  description  data 
previously  recorded  in  analysis  of  the  P.  multiseries  EST  database,  then  the  cDNAs  were 
identified  as  singletons  or  part  of  a  larger  contig.  Most  of  the  cDNAs  that  were 
differentially  expressed  were  part  of  a  larger  contig,  therefore,  the  array  data  for  all  of  the 
cDNAs  within  each  contig  were  analyzed  to  verify  the  overall  expression  of  the  gene. 
Encouragingly,  the  array  data  confirmed  that  the  individual  cDNAs  within  each  contig 
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were  consistently  up-  or  down-  regulated.  Generally  the  data  fell  within  the  original 
guidelines  for  fold-change  differences.  Occasionally,  however,  lower  fold-change 
differences  were  observed  that  fell  within  the  range  of  statistical  stringency  given  above 
(FDR  <  2.5%).  Any  features  that  were  not  statistically  significant  are  represented  as  '**' 
in  the  corresponding  data  table. 

Replicate  spots  may  be  collapsed  by  averaging  and  then  running  statistical  tests, 
or  the  replicates  may  be  analyzed  as  uncollapsed  data.  In  this  study,  the  two  layers  of 
replicates  (replicate  cDNAs  on  the  array  and  replicate  hybridizations)  were  accounted  for 
by  first  analyzing  the  replicates  spots  uncollapsed  using  the  modified  t-test  described 
above,  and  presenting  ratios  for  each  differentially  expressed  cDNA  replicate  in  tables. 
Then,  the  replicate  spots  were  averaged  and  presented  with  standard  deviations.  Finally, 
gene  expression  ratios  from  individual  cDNAs  within  a  larger  contig  were  averaged  with 
standard  deviations.  The  data  confirm  the  technical  and  biological  replicability  among 
these  experiments. 

Identification  of  mRNAs  Regulated  in  DA  Producing  Conditions:  Nucleotide  and 
deduced  amino  acid  sequences  were  analyzed  using  NCBI  tools  (including  Blast  and 
ORF  Finder),  Pfam,  SwissProt,  the  Vector  NTI  suite  from  InforMax  (Rockville,  MD), 
and  LaserGene  from  DNAStar  (Madison,  WI). 
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Results  and  Discussion: 


The  goal  of  this  study  was  to  identify  transcripts  that  were  up-regulated  in 
P.  multiseries  cells  actively  producing  domoic  acid  (DA)  compared  to  P.  multiseries  cells 
not  producing  DA.  Samples  for  microarray  analysis  were  obtained  from  three  separate 
growth  experiments;  DA  production  increased  and  peaked  during  the  stationary  growth 
phase  in  all  three  experiments  (Figures  3-15  to  3-17).  Experiments  125C  and  125D 
were  performed  under  non-axenic  culture  conditions,  whereas  AX1  was  performed 
under  axenic  culture  conditions.  DA  concentrations  were  33  times  higher  in  the  non- 
axenic  growth  experiments  than  in  the  axenic  growth  experiments.  Higher  DA 
production  in  the  non-axenic  growth  experiments  was  expected,  based  on  results  from 
previous  studies  (Douglas  and  Bates,  1992;  Douglas  et  al.,  1993;  Bates  et  al.,  1995;  Bates 
et  al.,  2003). 

Up-regulated  genes:  In  an  effort  to  select  for  genes  that  were  correlated  specifically  with 
DA  production  and/or  cell  growth,  significantly  expressed  genes  were  compared  across 
the  three  growth  experiments  and  only  those  transcripts  that  were  up-  regulated  in  all 
three  growth  experiments  were  considered  further.  Up-regulation  of  gene  expression  was 
observed  for  121  individual  cDNAs  across  all  three  P.  multiseries  growth  experiments. 

1 17  of  these  121  cDNAs  assembled  into  8  unique  contigs.  The  remaining  4  cDNAs 
(singletons)  represented  cDNA  sequences  otherwise  not  represented  in  the  EST  dataset. 
Up-regulated  cDNAs  represent  2.25%  of  the  clones  printed  on  the  P.  multiseries  chip. 
The  functional  identities  of  the  up-regulated  transcripts  were  suggested  from  sequence 
similarity  to  encode  a  3-carboxymuconate  cyclase,  phosphoenolpyruvate  carboxykinase 
(ATP-specific),  an  amino  acid  transporter,  a  small  heat  shock  protein,  a  long-chain  fatty- 
acid-CoA  ligase,  an  aldo/keto  reductase,  5  hypothetical  proteins,  and  one  potentially 
novel  protein  (Table  3-2).  The  following  discussion  will  focus  individually  on  each  of 
the  unique  contigs  or  sequences. 
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Figure  3-15:  Cell  growth  and  I)A  production,  by  Pseudo-nitzschia  multiseries  clone  CL-125,  axenic  cultures  (AX1).  Cells 
were  harvested  for  RNA  extraction  on  the  days  labeled  with  red  arrows  (Day  9  and  Day  42). 
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Figure  3-16:  Cell  growth  and  I)A  production,  by  Pseudo-nitzschia  multiseries  clone  CL-125,  non-axenic  culture  (125C).  Cells 
were  harvested  for  RNA  extraction  on  the  days  labeled  with  red  arrows  (Day  4  and  Day  10). 
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Figure  3-17:  Cell  growth  and  DA  production,  by  Pseudo-nitzschia  multiseries  clone  CL-125,  non-axenic  culture  (125D). 
Cells  were  harvested  for  RNA  extraction  on  the  days  labeled  with  red  arrows  (Day  4  and  Day  10). 
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Contig  1,  Cycloisomerase:  Contig  1  includes  22  cDNAs,  which  form  a  consensus 
sequence  2236  bp  long  (Figure  3-18).  Overall  average  expression  ratios  (fold-change 
differences)  were  3.24  (±  0.61)  in  Experiment  125C,  4.03  (±  0.80)  in  Experiment  125D, 
and  2.16  (±  0.44)  in  Experiment  AX1  (Table  3-3).  The  predicted  coding  region  for 
Contig  1  revealed  an  open  reading  frame  (ORF)  of  525  amino  acids  (Figure  3-19),  which 
aligned  with  COG2706,  a  cluster  of  orthologues  that  identifies  a  conserved  domain  for  3- 
carboxymuconate  cyclase  (Figure  3-20).  Additional  BLAST  analysis  supported  the 
temporary  assignment  of  Contig  1  as  a  muconate  cycloisomerase  (Tables  3-4  and  3-5). 
The  specific  enzyme  that  this  contig  aligns  most  closely  with,  carboxy-cis,cis-muconate 
cyclase,  catalyzes  the  cycloisomerization  of  3-carboxy-2,5-dihydro-5-oxofuran-2-acetate 
to  3 -carboxy-cis,cis-muconate  (Figure  3-21).  This  isomerization  is  reminiscent  of  that 
suggested  in  DA  synthesis  (Ramsey  et  al.,  1998)  and  offers  a  target  molecule  to  focus  on 
which  may  be  directly  involved  in  cyclization  leading  to  the  pyrrolidine  ring  in  the  DA 
molecule.  Alternatively,  the  enzyme  may  be  involved  in  converting  aromatic  compounds 
into  citric  acid  cycle  intermediates,  which  have  been  proposed  to  feed  the  pathway 
leading  to  DA  synthesis  (Ramsey  et  al.,  1998).  Searching  against  the  Thalassiosira 
pseudonana  genome  revealed  a  similar  sequence  within  T.  pseudonana  scaffold  79, 
which  shared  70%  identity,  and  78%  similarity  to  P.  multiseries  Contig  1  (Figure  3-22). 
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Figure  3-18: 

Sequence  Alignment  Overview  for  Contig  1  (2236bp,  22  clones,  35  sequences): 

(In  the  sequence  alignment  diagrams  throughout  this  document,  dotted  lines  vs.  solid  lines  represent  the 
direction  that  the  cDNAs  were  sequenced.) 


500  1000  1500  2000 


Figure  3-19:  Predicted  coding  region  for  Contig  1: 


^ -  Frame  from  to  Length 

Length:  525  aa  +2  a  155..  1732  1578 


MRIYQRTPTDLSATTAGTFIRSDSNEDEGEDDDHQLFFVTSYSDFEKLAHGPRGHEAKHSVHVYRF 

FPSDGSLVLLNIQGDADVVTNPAFSRHHPRLNVIYTCTEDCHENGRIIAFKVKPDGTLEQFGEPVDA 

GGTSTCYLTIDKAERNLLAVNYWNSTLVVIPMDPDTGALIGGVKNVYDPNMGKTMVACAKKDG 

GVNHSCNDASTISARQADPHSHALVLDPFVGRVAYVPDLGKDLVREFYYDATEGNIA1ELNVMPS 

GLCTGQPDGPRYLDFHPEYNIAYVVNELSSTVAVFEVDRELLNE1HEASRNGEDMNRFRGRSTLRL 

VQSIKTIPHAFPTTMNTCGRMCVHKSGRYVIVSNRGHQSITVFRVKTKGSKRGELQIVGCYHTRGE 

TPRHFQFDNSGQYLLVANQDTDSIAVFNFNLSNGELKYSGNEYRVPSPNFVCCCPTYSEDDTEIRQ 

RQENFESSIRAVTLAKDNENNSGSDSEDSTVPTWRGRSSEDNIKAELAKAREEIETLKKLLAERVQ 
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Figure  3-21;  3-carboxvmuconate  cyclase  -  Reaction  catalyzed: 
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Reference  Reaction  obtained  from:  KEGG:  Kyoto  Encyclopedia  of  Genes  and  Genomes  http://www.genome.ad.jp/. 


Figure  3-22:  Contig  1  Sequence  Alignment  with  Thalassiosira  Pseudonona : 
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Contig  3,  Phosphoenolpyruvate  carboxykinase:  Contig  3  includes  14  cDNAs,  which 
align  to  form  a  consensus  sequence  2158  bp  long  (Figure  3-23).  Overall  average 
expression  ratios  were  3.78  (±  0.26)  in  Experiment  125C,  2.94  (±  0.42)  in  Experiment 
125D,  and  2.95  (±  0.48)  in  Experiment  AX1  (Table  3-6).  The  predicted  coding  region 
for  Contig  3  revealed  an  open  reading  frame  (ORE)  of  532  amino  acids  (Figure  3-24). 
Blast  analysis  of  the  Contig  3  ORF  against  the  SwissProt  database  revealed  a  highly 
significant  hit  against  pfam01293  (E-value  =  4e-173)  and  COG  1866  (E-value  =  0.0),  both 
clusters  of  orthologues  that  code  for  phosphoenolpyruvate  carboxykinase  (PCK)  (Figure 
3-25).  The  consensus  nucleotide  sequence  and  individual  cDNAs  within  the  contig  were 
also  blasted  against  the  NR  database  and  results  supported  the  putative  assignment  of 
Contig  3  as  PCK  (Tables  3-7  and  3-8).  Alignment  of  the  deduced  protein  with  known 
PCKs  revealed  70%  similarity  and  58%  identity  with  PCK  of  Campylobacter  jejuni,  66% 
similarity,  55%  identity  with  COG  1866,  67%  similarity,  51%  identity  with  PCK  of 
Escherichia  coli,  62%  similarity,  46%  identity  with  PCK  of  Saccharomyces  cerevisiae, 
and  62  %  similarity,  45%  identity  with  PCK  of  Arabidopsis  thaliana.  Searching  the  T. 
pseudonana  genome  database  revealed  a  similar  sequence  with  77%  similarity,  73% 
identity  (Figure  3-26). 

Two  iso  forms  of  PCK  exist,  which  catalyze  either  ATP-dependent  or  GTP- 
dependent  decarboxylation  of  oxaloacetate  into  phosphoenolpyruvate  (PEP)  (Figure  3- 
27).  Contig  3  aligned  with  ATP-dependent  PCK,  which  may  be  involved  in  several 
functions,  including  gluconeogenesis,  pyruvate  metabolism,  and  C4  photosynthesis 
(Figures  3-28,  -29,  -30).  PCK  may  play  a  role  in  anaplerotic  formation  of  2-oxoglutarate, 
leading  to  the  synthesis  of  a  glutamate  derivative  (Lea  et  al.,  2001).  The  glutamate 
derivative  could  then  lead  to  domoic  acid  synthesis  as  suggested  in  both  of  the  DA 
models  (Ramset  et  al.,  1998;  Smith  et  al..  2001).  Alternatively,  the  supply  of  pyruvate 
could  contribute  to  isoprenoid  metabolism.  Ramsey  et  al.  (1998)  suggest  that  the 
principal  pathway  to  the  isoprenoid  portion  of  DA  is  via  an  alternative  route  from  the 
traditional  acetate-mevalonate  pathway  to  isoprenoid  synthesis,  which  utilizes 
glyceraldehyde  3-phosphate  (GAP)  and  pyruvate  (Eisenreich,  et  al.,  1998).  The  supply 
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carbon  dioxide  via  PCK  expression  may  also  indicate  a  role  in  C4  photosynthesis,  a 
debated  topic  in  diatom  research  (Reinfelder  et  al..  2000;  Johnston  et  al.,  2001). 
(Discussed  in  chapter  4.) 


Figure  3-23: 

Sequence  Alignment  Overview  for  Contig  3  (2158bp,  14  clones,  20  sequences): 


500  1000  1500  2000 


Figure  24:  Predicted  coding  region  for  Contig  3: 
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Table  3-6:  Fold-change  Measurements  for  Individual  cDNAs  within  Contig  3: 
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Figure  3-25:  Contig  3  Sequence  Alignment  with  COG2706, 
Conserved  Domain  for  Phosphoenolpyruvate  carboxykinase: 
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PEPCK_ATP,  Phosphoenolpyruvate  carboxykinase. 
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Table  3-7:  Contig  3,  Blast  Results,  Overview: 
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Figure  3-26:  Contig  3  Sequence  Alignment  with  Phosphenolpyruvate  Carboxykinase  Sequences 
from  Arabidopsis  thaliana  (62%  similarity,  45%  identity),  Saccharomyces  cerevisiae  (62%,  46%), 
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Figure  3-28:  Phosphoenolpyruvate  carboxykinase  (ATP)  in  Citrate  Cycle: 
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Reference  Pathway  obtained  from:  KEGG:  Kyoto  Encyclopedia  of  Genes  and  Genomes  (http://www.genome.ad.jp/) 


Figure  3-29:  Phosphoenolpyruvate  carboxykinase  (ATP)  in  Pyruvate  metabolism: 
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Reference  Pathway  obtained  from:  KEGG:  Kyoto  Encyclopedia  of  Genes  and  Genomes  (http://www.genome.ad.jp/). 


Figure  3-30:  Phosphoenolnvruvate  carboxykinase  (ATP)  in  Carbon  Fixation: 
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Reference  Pathway  obtained  from:  KEGG:  Kyoto  Encyclopedia  of  Genes  and  Genomes  (http://www.genome.ad.jp/). 


Contig  4,  Amino  acid  transporter:  Contig  4  includes  3  cDNAs,  which  align  to  form  a 
consensus  sequence  1818  bp  long  (Figure  3-31).  Overall  average  expression  ratios  were 
3.24  (±  0.25)  in  Experiment  125C,  3.31  (±  0.0.7)  in  Experiment  125D,  and  1.91  (±  0.23) 
in  Experiment  AX1  (Table  3-9).  The  predicted  coding  region  for  Contig  4  revealed  an 
open  reading  frame  (ORF)  of  363  amino  acids,  which  aligned  with  pfam00209,  a 
sodium:neurotransmitter  symporter  family  (Figures  3-32  and  3-33).  The  P.  multiseries 
sequence  aligned  most  closely  with  a  novel  human  amino  acid  transporter,  hATB(>+,  with 
an  E-value  of  8E-34  (Table  3-10,  3-1 1,  Figure  3-34).  hATB0*  is  NA+/CF  dependent 
member  of  the  neurotransmitter  symporter  family,  with  the  highest  sequence  similarity  to 
the  glycine  and  proline  transporters.  hATB0"  was  found  to  transport  both  neutral  and 
cationic  amino  acids  (Sloan  and  Mager,  1999.)  Searching  the  T.  pseudonana  genome  and 
P.  tricornutum  EST  databases  did  not  reveal  any  homologous  sequences,  suggesting  the 
hypothesis  that  this  amino  acid  transporter  is  unique  to  Pseudo-nitzschia  spp  and  not 
present  in  non-toxin-producing  diatoms.  If  this  transporter  is  unique  to  P.  multiseries,  it 
may  be  that  the  transporter  is  actively  involved  in  export  of  DA  from  the  cell  or  imports  a 
precursor  to  DAinto  the  cell. 
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Figure  3-31:  Sequence  Alignment  Overview  for  Contig  4  (1818bp,  3  clone  consensus 
sequences): 
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Figure  3-32:  Predicted  coding  region  for  Contig  4: 
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Figure  3-33:  Contig  4  Sequence  Alignment  with  pfam00209,  COG0733, 
Conserved  Domain  for  sodium:neurotransmitter: 
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Query:  193  SLMYCSDAGLFWLDVI DFYI -NFVMI LVGFFEAFGSAWAYDLP  234 
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Figure  3-34:  Contig  4  Sequence  Alignment  Against  hATB 
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Contig  5,  Small  Heat  Shock  Protein:  Contig  5  includes  seven  cDNAs,  which  form  a 
consensus  sequence  1001  bp  long  (Figure  3-35).  Overall  expression  ratios  were  very 
high  in  all  three  experiments,  which  averaged  7.00  (±  0.16)  in  Experiment  125C,  7.81  (± 
0.43)  in  Experiment  125D,  and  4.40  (±  0.41)  in  Experiment  AX1  (Table  3-12).  The 
predicted  open  reading  frame  of  209aa  appears  to  be  a  small  heat  shock  protein  (Figure  3- 
36,  Table  3-13,  3-14).  Pfam  HMM  analysis  revealed  homology  with  hsp20,  a  family  of 
alpha-crystallin  hsps  (E-value  -  9.7E-23)(Figure  3-37).  The  alpha-crystallin-type  heat 
shock  proteins  are  a  family  of  small  stress-induced  proteins  ranging  from  12  to  43  kDa, 
whose  common  feature  is  the  alpha-crystallin  domain.  Generally  active  as  large 
oligomers  consisting  of  multiple  subunits,  these  proteins  are  believed  to  be  ATP- 
independent  chaperones  that  prevent  stress-induced  denaturation  and  aggregation,  and  are 
important  in  refolding  in  combination  with  other  heat  shock  proteins  (Narberhaus,  2002). 
The  induction  of  a  small  heat  shock  protein  is  consistent  with  the  conditions  of  cell 
growth,  given  that  cells  would  be  stressed  during  stationary  growth  as  certain 
environmental  conditions  become  limiting.  Interestingly,  similarity  searching  against  the 
T.  pseudonana  database  did  not  reveal  any  homology  and  searching  against  the  P. 
tricornutum  database  revealed  a  short  sequence  of  39  bp  with  weak  similarity  (14/39 
[35%]  identity).  Follow-up  expression  studies  to  determine  functionality  will  shed  light 
on  whether  this  is  truly  a  stress  response  protein  or  if  it  may  function  in  DAmetabolism. 
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Figure  3-35:  Sequence  Alignment  Overview  for  Contig  5  (1001  bp,  7cIones): 
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Figure  3-36:  Predicted  coding  region  for  Contig  5: 


Length:  209  aa 


Frame  from  to  Length 
+2  a  53..  682  630 


MSMNLSKQAAGALAFAPFASPWSWGFGPSHFYSLAPSMRLAESAFDTDIERAIEKQQKLAQRMW 

DQASVVPQVLGQHNRYELIDNDEKFQLTVDVPGIKQDDID1KLDEGF1TVEGHREATTKNSRFSSKF 

AQTFSLDPAVDVKKITATLDNGVLVVAAPKEAAKLEEKPRRIPIQTMKKKEELKAAKHDIPVETVG 

EKEEVMDLDKEN 


126 


if) 

#OJD 

3 

© 

U 

c 


</5 

< 

z 

Q 

u 

V) 

"3 

3 

2 

!E 

-3 

c 


s 

© 

£ 

© 

•- 

3 

C/5 

3 

© 


© 

0X5 

c 

3 


© 

U. 

c4 


o 

3 

H 


S.D 

600 

o 

© 

oro 

|  0.04  | 

AX1 

Averag 

e 

4.61 

4.68 

3.78 

4.51 

4.40 

0.41 

AX1 

replicate 

spot  2 

4.55 

4.65 

98e 

00 

Tf* 

4.38 

9C0 

AX1 
replicate 
spot  1 

4.67 

4.71 

3.71 

4.54 

4.41 

0.47 

as 

0.37 

m 

ON 

© 

ero 

660 

125C 

Averag 

e 

7.79 

o o 

7.20 

8.12 

7.81 

0.43 

125C 
replicate 
spot  2 

7.53 

8.76 

7.10 

8.82 

p 

oo 

0.87 

125C 
replicate 
spot  1 

O 
o 6 

7.45 

7.29 

7.43 

7.56 

0.34 

as 

o 

© 

HO 

OO'O 

0.12 

125D 

Averag 

e 

7.05 

m 

7.05 

9L9 

o 

o 

910 

125D 
replicate 
spot  2 

6.76 

7.21 

7.06 

6.68 

6.93 

0.25 

125  D 
replicate 
spot  1 

7.33 

7.05 

7.05 

6.85 

7.07 

0.20 

cDNA 

Identifier 

178D3 

173B6 

167D8 

160G1 

AVERAG 

E 

S.D. 

127 


£ 

4> 


0> 

o 


a: 


22 

ir> 

.2f 

c 

o 

U 


z 

CQ 

H 


£ 

jj 

•- 

> 

O 

os 

z 

I 

«« 

z 

6 

OS 

X 

■w 

IB 

«3 

02 


< 

z; 

a 

o 

"5 


e 


rv 

ID 

0£ 


O 

U 


i 

a> 

£ 


H 


E-value 

2.00E-08 

2.00E-05 

9000 

4.00E-07 

8.00E-10 

© 

A 

Species  or  Domain 

Name 

Mesorhizobium  loti 

Mesorhizobium  loti 

Ralstonia  eutropha 

Mesorhizobium  loti 

Mesorhizobium  loti 

Putative  Identification 

small  heat  shock  protein 

small  heat  shock  protein 

NO  HITS 

small  heat  shock  protein 

small  heat  shock  protein 

small  heat  shock  protein 

NCBI  Identifier 

NP  103744.1 

NP  103744.1 

ZP  00170792.2 

NP  103744.1 

NP  103744.1 

Length  (bp) 

1057 

1036 

259 

987 

1022 

551 

375 

PSN  Identifier 

ID091 

167D8 

16E7 

173B6 

178D3 

A1D7 

A1D8 

<N 

m 

VO 

128 


Figure  3-37:  Contig  5  Sequence  Alignment  with  pfamOOOl  1,  cd00298,  COG0071 
Conserved  Domain  for  small  Heat  Shock  Proteins: 
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Contig  6,  Acyl-CoA  synthetase  (AMP -forming):  Contig  6  includes  13  cDNAs,  which 
align  to  form  a  consensus  sequence  2438  bp  long  (Figure  3-38).  Overall  average 
expression  ratios  were  4.66  (±  0.92)  in  Experiment  125C,  3.76  (±  0.79)  in  Experiment 
125D,  and  2.13  (±  0.26)  in  Experiment  AX1  (Table  3-15).  Contig  6  revealed  a  coding 
region  that  was  split  into  four  separate  reading  frames.  Each  deduced  sequence  showed 
high  homology  to  AMP-forming  acyl-coA  synthetase,  so  the  coding  regions  were  spliced 
together  for  subsequent  analysis  (Figure  3-39).  The  deduced  protein  aligned  closely  with 
this  family  of  enzymes  that  act  via  an  ATP-dependent  covalent  binding  of  AMP  to  their 
substrate  (Figure  3-40,  Table  3-16);  these  enzymes  have  been  shown  to  function  in  lipid 
metabolism,  secondary  metabolite  biosynthesis,  transport,  and  catabolism  (Faergeman  et 
al.,  1997;  Sharma  et  al.,  1996;  Black  et  al.,  1992).  Up-regulation  of  this  transcript  evokes 
the  suggestion  that  it  may  function  in  the  formation  of  the  isoprenoid  side  chain  of  DA. 

T.  pseudonana  alignment  produced  multiple  hits,  with  two  areas  of  high  sequence 
identity  on  scaffold  1  (Figure  3-41). 


Figure  3-27:  Sequence  alignment  Overview  for  Contig  6  (2438bp,  13  clones): 
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Figure  3-28:  Predicted  coding  region  for  Contig  6: 
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Table  3-15:  Fold-change  Measurements  for  Individual  cDNAs  within  Contig  6: 
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Figure  3-40:  Contig  6  Sequence  Alignment  with  Conserved  Domain  for  Acyl-coA  synthetase: 
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Table  3-16:  Contig  6  Blast  Results,  Overview: 
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Contig  7,  Aldo/keto  reductase  family:  Contig  7  includes  seven  cDNAs,  which  align  to 
form  a  consensus  sequence  1742  bp  long  (Figure  3-42).  Overall  average  expression 
ratios  were  3.12  (±  0.60)  in  Experiment  125C,  2.88  (±  0.26)  in  Experiment  125D,  and 
1 .83  (±  0.18)  in  Experiment  AX1  (Table  3-17).  The  predicted  coding  region  for  Contig  7 
revealed  an  open  reading  frame  that  was  split  between  frames  -2  and  -3  (Figure  3-43). 
Blast  analysis  indicated  that  both  reading  frames  were  homologous  to  an  aldo/keto 
reductase  conserved  domain,  corroborating  that  the  deduced  protein  was  split  between 
two  reading  frames.  The  predicted  coding  regions  were  spliced  together  for  subsequent 
analysis.  Further  analysis  confirmed  the  identity  of  Contig  7  as  likely  to  encode  an 
aldo/keto  reductase  (Figure  3-44,  Table  3-18,  3-19).  Contig  7  aligned  with  a  family  of 
proteins  that  includes  a  number  of  K+  ion  channels  with  reported  oxidoreductase  activity, 
which  hints  that  the  deduced  protein  may  have  a  role  in  transport  across  the  cell 
membrane  (Figure  3-45).  Alignment  of  contig  7  with  T.  pseudonana  genome  sequence 
revealed  a  region  of  similarity  spanning  over  an  approximately  930  bp  area  that  appears 
to  be  interspersed  with  introns  (overall  E  value  =  1.7E-35). 
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Figure  3-42:  Sequence  Alignment  Overview  for  Contig  7  (1742bp,  7  clones,  8 
sequences ): 

500  1000  1500 
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Figure  3-43:  Predicted  coding  region  for  Contig  4  (ORF  split  between  tw,o  reading 
frames): 
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Figure  3-45:  Contig  7  Sequence  Aligned  with  pfam00248: 
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Contig  8,  Unidentified'.  Contig  8  includes  four  cDNAs,  which  align  to  form  a  consensus 
sequence  1055  bp  long  with  an  open  reading  frame  of  276aa  (Figure  3-46,  3-47).  Overall 
average  expression  ratios  were  6.90  (±  0.93)  in  Experiment  125C,  5.10  (±  1.49)  in 
Experiment  125D,  and  3.01  (±  0.87)  in  Experiment  AX1  (Table  3-20).  Similarity 
searches  revealed  weak  similarity  (46/1 14;  40%)  with  myotubularin,  which  displays  dual 
tyrosine  and  serine  phosphatase  activity  (Table  3-21)  (Cui  et  al,  1998;  Laporte  et  al, 
1998).  ProSite  identified  two  regions  within  the  Contig  8  ORF  as  tyrosine  sulfation  sites 
(amino  acid  residues  137  -  151  gmemdqdYtrndasl,  and  212  -  226  alcaaddYfmepnik ). 
Tyrosine  sulfation  is  a  post-translational  modification  of  many  secreted  and  membrane- 
bound  peptides  (Figure  3-49,  Moore,  2003).  The  up-regulation  of  Contig  8  in  correlation 
with  increased  DA  production  may  suggest  that  this  protein  has  some  role  in  post- 
translational  modification  of  a  precursor  molecule  leading  to  DA,  whose  destiny  is 
ultimately  to  be  secreted  into  the  marine  environment.  Searching  against  the  T. 
Pseudonana  database  revealed  a  139aa  region  of  similarity  with  54%  positives,  36% 
identity  (Figure  3-48). 
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Figure  46:  Sequence  Alignment  Overview  for  Contig  8  (1055bp,  4  clone  consensus 
sequences): 
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Figure  47:  Predicted  coding  region  for  Contig  8: 


Length:  276  aa 


Frame  from  to  Length 


-1  h  171..  1001  831 

MRSNFILLAFSLLGSSFLLDRAQGLSGLGKLVEAQYEKRVSLG 

LNIGEGQNSKLAINGIVFDLMKEESRTEFSEMGK.HWRASYTGGLHMLNIVQDGSFVSKQGKETVK 
I  .KGCWF.IVWRF.GDHGSLYCGMEMDODYTRNDASLKGMTYVSFNAWSKEGLK.KAOEFKERSAK 
RANMAI  HKRDF.F.l  SKMI.FTSNIFOKGLHYYNALCAADDYFMEN1KMKAVSDEEVVOFEGDMYV 
SKNGKVWAHDSSKGKQVMIGTVSLELMNKQA 


‘Underlined  regions  represent  ProSite  identified  tyrosine  sulfation  sites. 
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Figure  49:  Tyrosine  sulfation 


cd 

4-h 

O 
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Tyrosylprotein  sulfotransferase  catalyzse  the  transfer  of  sulfate  from  the  universal  sulfate  donor  PAPS  to  the  hydroxyl  group 
luminally  oriented  peptidyltyrosine  residue  to  form  a  tyrosine  04-sulfate  ester  and  3',5'-ADP  (Moore,  2003). 


Singletons:  Clone  17F1 1  was  sequenced  in  both  directions,  yielding  a  consensus 
sequence  of  1270  bp  with  an  open  reading  frame  of  236aa  (Figure  3-51,  3-52).  However, 
gene  prediction  analysis  of  Clones  45H6,  75E8,  and  6H1  revealed  several  possible 
reading  frames,  therefore,  no  single  reading  frame  may  be  assigned  to  these  cDNAs.  All 
three  of  these  clones  were  sequenced  in  both  directions,  yielding  lengths  of  323  bp  for 
45H6,  919  bp  for  75E8,  and  two  non-overlapping  sequences  of  552  bp  and  516  bp  for 
6H1  (Figures  3-54  to  3-56).  BLAST  analysis  did  not  reveal  conclusive  homologies  for 
any  of  these  cDNAs,  however,  the  high  fold-change  values  for  these  clones  suggest  that 
they  may  be  promising  candidates  for  future  study  (Tables  3-22,  -25,  -27,  -29).  Clones 
45H6  and  75E8  demonstrated  some  of  the  highest  values  seen  in  the  axenic  growth 
experiments,  and  also  showed  high  fold-change  values  in  Experiments  125C  and  125D. 
Homology  searches  did  suggest  that  all  four  of  these  clones  may  encode  proteins  that 
exhibit  hydrolase  or  isomerase  activity  (Tables  3-23,  -26,  -28,  -30).  For  example,  6H1 
appears  to  have  some  homology  to  a  glutamine-hydrolyzing  asparagine  synthase  with 
43%  identity  and  53%  similarity  over  a  30aa  conserved  region  that  is  part  of  cd00712.1,  a 
glutamine  amidotransferase  domain. 
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Figure  3-51: 

Sequence  Alignment  Overview  for/*,  multiseries  cDNA  17  Fll 
(1270bp,  1  clone,  2  sequences): 
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Figure  3-52:  Predicted  coding  region  for  P.  multiseries  cDNA  17  Fll: 


Length:  236  aa 


Frame  from  to  Length 
+1  a  49..  759  711 


MDNALRDLTEHHFDPTTMSLLPSKGWESPPNYFIATTPGHPWMLMTLHYGIGSLSKIVSSMRNNP 

AKHTGPSAFKIGFILFQRAIG1DTDGYLPAGIYNGAMFNGTIQQFAEAGIVLSGEAGGGKGNGTHSE 

QKPQRSITLVGSKDNFHQYVDRKA1RQYSKELRKMNMSHWHEQERRPKKRVSCLEHMERQDERV 

SALNLTLPSVWVPPTDMDSWWYPRYQKANYDFNGTFIEPS 


Table 3-22:  Fold-change  Measurements  for/*,  multiseries  cDNA  17  Fll: 


125D 

125C 

AX1 

Replicate  spot  1 

5.55 

6.19 

2.02 

Replicate  spot  2 

5.35 

3.77 

2.13 

Average 

5.45 

4.98 

2.07 

S.D. 

0.14 

1.71 

0.08 
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Table  3-23:  P.  multiseries  cDNA  17  Fll  Blast  against  NR  database: 


Positives 
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Figure  3-54: 

Sequence  Alignment  Overview  for  P.  multiseries  cDNA  75E8  (919bp,  1  clone,  2 
sequences): 
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Table  3-25:  Fold-change  Measurements  for  P.  multiseries  cDNA  75E8: 


125D 

125C 

AX1 

Replicate  spot  1 

3.36 

3.25 

3.48 

Replicate  spot  2 

3.16 

3.18 

3.59 

Average 

3.26 

3.21 

3.54 

S.D. 

0.14 

0.05 

0.07 

Table  3-26:  P.  multiseries  cDNA  75E8  Blast  against  NR  database: 


Consensus 

Length 

NCBI 

Identifier 

Description 

Species 

E- 

value 

Identities 

(%) 

Positives 

(%) 

75E8 

919 

AAM54097.I| 

3-0- 

acyltransferase 

Actinosynnema 

pretiosum 

4.7 

35/122 

(28%) 

51/122 

(41%) 
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Figure  3-55:  Sequence  Alignment  Overview  for  P.  multiseries  cDNA  6H1 
(1  clone,  2  non-overlapping  sequences,  M13r  =  552,  T7  =  516): 
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Table  3-27:  Fold-change  Measurements  for  P.  multiseries  cDNA  6H1: 


125D 

125C 

AX1 

Replicate  spot  1 

5.73 

2.86 

2.51 

Replicate  spot  2 

5.16 

2.77 

2.59 

Average 

5.45 

2.81 

2.55 

S.D. 

0.40 

0.06 

0.06 

Table  3-28:  P.  multiseries  cDNA  6H1  Blast  against  NR  database: 


Length 

(bp) 

NCBI 

Identifier 

Description 

Species 

E- 

value 

Identities 

m 

Positives 

_ 

6H1 

M13r 

552 

ZP  001  18359. 

1 

Asparagine  synthase 
(glutamine  -hydrolyzing) 

Cytophaga 

hutchinsonii 

2.0 

13/30 

(43%) 

16/30 

1 

6H1 

T7 

516 

BAC55537.1 

NADH  dehydrogenase 

Carex 

shimidzensis 

1.1 

20/68 

(29%) 

33/68 

(48%) 
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Figure  3-56: 

Sequence  Alignment  Overview  for  P.  multiseries  cDNA  45H6  (323bp,  1  clone,  3 
sequences): 
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Table  3-29:  Fold-change  Measurements  for  P.  multiseries  cDNA  45H6: 


125D 

125C 

AX1 

Replicate  spot  1 

3.38 

3.79 

3.14 

Replicate  spot  2 

3.31 

3.97 

3.21 

Average 

3.35 

3.88 

3.17 

S.D. 

0.05 

0.13 

0.05 

Table  3-30:  P.  multiseries  cDNA  45H6  Blast  against  NR  database: 


Consensus 

Length 

NCBI 

Identifier 

Description 

Species 

E- 

value 

Identities 

(%> 

Positives 

<°0> 

45H6 

323 

AAN39118.il 

peptidylprolyl 
cis-trans  isomerase 

Drosophila 

melanogaster 

6.7 

17/64 

(26%) 

28/64 

(43%) 
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Contig  2,  Novel:  Contig  2  includes  53  cDNAs,  which  align  to  form  a  consensus 
sequence  2445  bp  long  (Figure  3-46).  Overall  average  expression  ratios  were  4.25 
(±0.56)  in  Experiment  125C,  4.24  (±0.09)  in  Experiment  125D,  and  1.57(±  0.03)  in 
Experiment  AX1  (Table  3-31).  No  open  reading  frame  could  be  conclusively  determined 
for  contig  2.  Similarity  searches  revealed  no  significant  homology  with  any  known 
protein,  although  some  cDNAs  demonstrated  weak  similarity  to  glutamate-ammonia- 
ligase  adenylyltransferase  (Identities  =  17/49  (34%),  Positives  =  25/49  (51%))(Table  3- 
32).  No  homologous  sequences  were  found  in  T.  pseudonana  nor  P.  tricornutum.  The 
consistent  up-regulation  among  the  individual  cDNAs  within  this  contig  suggests  that  it 
is  truly  expressed,  however,  further  investigation  will  be  needed  to  determine  the  identity 
of  this  transcript. 
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Figure  3-57:  Sequence  Alignment  Overview  for  Contig  2  (2445bp,  53  clones,  78sequences): 
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Table  3-31:  Fold-change  Measurements  1 

125C 
replicate 
spot  1 

c* 

o\ 

CM 

oo 

co 

CM 

to 

VO 

to 

CO 

p 

to 

4.46 

to 

p 

CO 

VO 

to 

CO 

to 

00 

CO 

CM 

Tf 

o 

o 

CM 

P 

co 

p 

CO 

4.07 

p 

4.60 

00 

q 

tT 

o 

p 

co 

to 

q 

Tt 

vo 

p 

CO 

r- 

p 

co 

CM 

Tf 

to 

p 

CO 

OV 

to 

4.42 

4.46 

ov 

p 

co 

4.74 

Q 

ai 

© 

d 

to 

o 

CO 

CM 

d 

000 

ov 

O 

d 

000 

o 

© 

d 

0.02 

CM 

O 

o 

Tf 

o 

d 

00 

o 

d 

0.02 

0.02 

900 

CM 

d 

00 

o 

d 

© 

d 

Tf 

o 

d 

900 

© 

© 

© 

© 

CM 

© 

d 

0.84 

d 

d 

d 

c- 

© 

© 

125D 

AVG 

4.43 

to 

CM 

to 

5.72 

3.44 

4.70 

4.20 

3.30 

00 

vq 

CO 

3.47 

CO 

q 

to 

to 

5 

00 

CO 

CM 

p 

CO 

4.36 

r* 

oo 

CO 

4.99 

to 

p 

CO 

tT 

P 

co 

4.60 

CO 

q 

CO 

TT 

to 

CO 

4.20 

4.20 

© 

© 

3.99 

CM 

P 

00 

CO 

NO 

© 

to 

125D 
replicate 
spot  2 

Tf 

p 

VO 

p 

to 

to 

to 

to 

p 

CO 

4.64 

4.20 

00 

p 

CO 

vq 

CO 

i 

p 

CO 

CM 

O 

to 

oo 

On 

tT 

i 

VO 

p 

co 

p 

co 

4.37 

CO 

00 

CO 

VO 

to 

ov 

p 

CO 

Tf 

P 

CO 

4.57 

© 

CO 

C- 

p 

CO 

OV 

tT 

00 

Tf 

3.40 

4.06 

© 

00 

tt 

ov 

00 

CO 

© 

to 

125D 
replicate 
spot  1 

co 

p 

rr 

to 

to 

oo 

oo 

to 

to 

p 

CO 

4.76 

4.20 

CM 

CM 

co 

OV 

vq 

CO 

oc 

1  CO 

1 

1 

S 

to 

co 

Ov 

r- 

oo 

co 

co 

p 

CO 

to 

p 

CM 

On 

CO 

CM 

00 

Tf 

p 

co 

CO 

p 

CO 

co 

p 

ON 

OV 

CM 

p 

CO 

© 

CM 

Tf 

CM 

Tf 

OV 

p 

ON 

CO 

TT 

vq 

Mr 

CO 

p 

co 

© 

to 

cDNA 

Identifier 

135C4 

137A11 

137F1 

137F11 

160B6 

160H4 

166D7 

168D12 

00 

Q 

00 

vO 

168F1 

168H1 

169D6 

170B3 

170C11 

170F12 

171B3 

171C3 

171F4 

173B5 

174C3 

175D6 

175H9 

177D6 

177E3 

OC 

CQ 

00 

178B9 

178D11 

179C7 

— 

CM 

co 

to 

VO 

r- 

OO 

OV 

o 

— 

CM 

CO 

to 

VO 

r- 

00 

On 

© 

CM 

CM 

CM 

CM 

CO 

CM 

tT 

CM 

«o 

CM 

vO 

CM 

r- 

CM 

00 

CM 

155 


S.D. 

100 

0.02 

eoo 

rf 

O 

d 

100 

100 

000 

100 

300 

100 

0.00 

© 

© 

i 

• 

i 

TT 

© 

© 

0.02 

0.02 

000 

100 

100 

i 

310 

i 

i 

• 

0.04 

1 

i 

i 

OO 

© 

© 

© 

© 

AX1 

AVG 

co 

vq 

wq 

wo 

wq 

o 

Ov 

O 

wq 

cq 

00 

cq 

Tf 

o 

vq 

00 

vq 

VO 

wq 

00 

wq 

• 

i 

• 

oi 

vq 

wo 

00 

cq 

r^ 

rf 

vo 

cq 

• 

i 

i 

VO 

wq 

o- 

cq 

VO 

wo 

rq 

vo 

vq 

OS 

r- 

<N 

© 

AX1 
replicate 
spot  2 

04 

VO 

co 

wq 

r- 

wq 

00 

00 

wq 

co 

cq 

00 

cq 

WO 

3 

vq 

wq 

vo 

wq 

* 

* 

rr 

vq 

co 

co 

00 

cq 

vo 

vo 

cq 

* 

* 

wo 

vq 

* 

* 

co 

* 

* 

© 

vq 

© 

rsi 

VO 

wq 

CM 

© 

AX1 
replicate 
spot  1 

vO 

o 

wo 

co 

co 

ov 

OV 

co 

00 

CO 

00 

wo 

ov 

vo 

VO 

wo 

3 

* 

Os 

wo 

Os 

CO 

rr 

00 

CO 

oo 

wo 

CO 

* 

oo 

co 

On 

wo 

o- 

r^- 

r^ 

oo 

00 

wo 

04 

04 

© 

as 

o 

© 

rr 

O 

d 

o 

d 

o 

610 

o 

d 

0.15 

0.15 

wo 

o 

o 

© 

© 

0.14 

wo 

© 

© 

|  0.02  | 

600 

tT 

© 

© 

oi 

© 

© 

© 

r^ 

© 

© 

CO 

© 

© 

600 

04 

© 

0.14 

0.26 

900 

900 

690 

125C 

AVG 

4.74 

co 

wq 

4.67 

4.56 

3.60 

4.29 

CO 

04 

CO 

<N 

rq 

CO 

3.67 

VO 

o 

rq 

CO 

wo 

o- 

cq 

4.24 

00 

vq 

tT 

rq 

wo 

vq 

CO 

4.39 

4.89 

4.55 

3.40 

4.54 

3.79 

CM 

OO 

wo 

4.25 

0.56 

125C 
replicate 
spot  2 

4.69 

4.56 

3 

rq 

TT 

rq 

co 

4.26 

CO 

CO 

Ol 

3.76 

VO 

VO 

CO 

wq 

rq 

co 

4.47 

4.30 

4.27 

CO 

00 

rq 

09’ e 

4.36 

4.34 

co 

© 

wo 

wo 

vq 

Ol 

Ol 

CO 

4.49 

oq 

co 

co 

vd 

4.27 

090 

125C 
replicate 
spot  1 

4.79 

4.50 

4.74 

4.42 

3.47 

4.32 

4.34 

4.02 

3.69 

3.67 

rq 

00 

CO 

^r 

CO 

4.22 

4.53 

4.70 

3.70 

4.47 

4.74 

4.45 

3.58 

4.59 

3.75 

5.33 

4.22 

0.54 

as 

0.15 

o 

d 

OO 

O 

d 

1 

i 

© 

d 

CO 

o 

d 

wo 

O 

o 

0.02 

o 

d 

oro 

0.07 

CO 

© 

© 

000 

r- 

© 

© 

rl- 

© 

© 

WO 

© 

© 

zro 

000 

© 

© 

CO 

© 

© 

1 

i 

1 

0.02 

© 

© 

0.27 

CO 

© 

© 

3 

© 

125D 

AVG 

<s 

wo 

00 

rq 

o 

00 

Tf 

04 

co 

cO 

vq 

co 

o 

CO 

CO 

Os 

© 

c*’' 

cq. 

CO 

CO 

rq 

Ol 

wq 

CO 

rq 

'sr 

oo 

oo 

CO 

Ol 

04 

wo 

TT 

wo 

oo 

rq 

CO 

o* 

vq 

CO 

04 

vq 

CO 

© 

rr 

CO 

rq 

wo 

CO 

On 

04 

CO 

WO 

r- 

cq 

CO 

OO 

vq 

WO 

Tf 

04 

© 

125D 
replicate 
spot  2 

cq 

wo 

co 

oo 

4.75 

* 

* 

co 

vq 

co 

04 

CO 

4.39 

wo 

Os 

4.07 

VO 

00 

co 

00 

vo 

4.49 

4.74 

co 

OS 

CO 

wo 

CM 

wo 

wq 

sO 

00 

CO 

o- 

vq 

co 

CO 

VO 

co 

3.42 

* 

* 

4.92 

3.12 

vq 

3.39 

wo 

CN 

WO 

4.22 

0.67 

125D 
replicate 
spot  1 

wo 

4.72 

4.86 

3.24 

Tt 

vq 

co 

00 

cq 

CO 

4.47 

4.92 

4.13 

© 

© 

00 

r- 

wo 

4.73 

CO 

oo 

co 

5.19 

4.57 

3.69 

3.67 

vo 

co 

00 

cq 

co 

r^ 

wo 

4.94 

co 

4.26 

wo 

cq 

co 

6.12 

4.25 

0.73 

cDNA 

Identifier 

179F10 

179H9 

17B7 

17E2 

04 

5 

o 

oo 

186E1 

37826F4 

37826H7 

45B10 

45F9 

47G2 

47H11 

51E3 

53H4 

54  H2  rep  1 

54  H2  rep  2 

55A9 

55B4 

04 

W 

wo 

3 

vo 

72B5 

74B5 

75B12 

75C4 

77F4 

OO 

Q 

OO 

r^ 

AVERAGE 

d 

cd 

as 

ON 

Ol 

o 

co 

co 

(N 

co 

co 

co 

^r 

co 

WO 

CO 

VO 

CO 

o 

CO 

00 

CO 

Os 

© 

ii 

04 

CO 

■Tj- 

WO 

VO 

r^ 

00 

On 

© 

wo 

WO 

04 

wo 

CO 

wo 

^r 

WO 

156 


Table  3-32:  Contig  2  Blast  Results,  Overview; 
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Table  3-32;  Contig  2  Blast  Results,  continued: 
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Down-regulated  genes :  Fifteen  contigs  were  observed  to  be  down-regulated  during  the 
transition  to  DA  production  (Table  3-33).  Within  this  group  are  several  genes  whose 
likely  functions  can  be  assigned  based  on  amino  acid  sequence  homology.  5G12  is  likely 
to  encode  P.  multiseries  ribosomal  protein  L22.  The  down-regulation  of  transcription  of 
a  ribosomal  protein  mRNA  would  be  consistent  with  the  switch  from  log  phase  growth  to 
stationary  phase.  Similarly,  sequence  similarity  suggests  that  78B2  may  represent  the 
P.  multiseries  protein  with  functionality  similar  to  the  mammalian  protein  Kif4.  This 
kinesin  family  member  is  a  motor  protein  which  is  suggested  to  play  an  essential  role  in 
the  organization  of  central  spindles  and  midzone  formation  during  cytokinesis  (Kurasawa 
et  al.,  2004;  Lee  and  Kim,  2004).  Down-regulation  of  a  gene  product  involved  in  cell 
division  would  also  be  consistent  with  the  switch  from  log  phase  growth  to  stationary 
phase.  The  down-regulation  of  PSN0100,  which  likely  encodes  Ppi-phosphofructokinase 
is  of  interest  because  it  may  suggest  an  alteration  in  pathways  involving  energy 
metabolism  in  P.  multiseries  cells  as  they  transition  from  log  phase  growth  to  stationary 
phase. 

The  down-regulation  of  FCP  is  of  interest.  As  discussed  in  chapter  2,  FCPs  are 
major  components  of  the  photosystem  Il-associated  light  harvesting  complex  in  diatoms 
and  other  brown  algae  (Bhaya  and  Grossman,  1993).  Down-regulation  of  FCP  in  P. 
multiseries  may  be  a  significant  aspect  of  the  transition  to  stationary  growth,  when 
photosynthesis  would  presumably  decrease  as  cell  growth  slows  due  to  a  limiting  factor. 
The  down-regulation  of  PSN0020,  a  presumptive  P.  multiseries  heat  shock  factor  2,  may 
represent  a  transition  in  the  chaperone  content  of  P.  multiseries  cells  as  they  enter 
stationary  phase. 

It  is  also  of  interest  to  note  that  over  half  of  the  down-regulated  genes  (8  of  15) 
show  no  significant  homology  to  any  known  protein  coding  sequence.  These  contigs 
provide  an  opportunity  to  discover  new  functions  associated  with  the  transition  to  toxin 
production  in  P.  multiseries. 
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Table  3-3:  Overview  of  Down-regulated  cDNAs  in  PSN  Differential  Expression  Study 
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Conclusions: 


The  P.  multiseries  cDNA  microarray  included  5372  cDNAs.  Based  on  the 
redundancy  calculations  from  Chapter  2,  the  number  of  non-redundant  sequences 
represented  on  the  array  may  be  estimated  as  3398,  or  approximately  85%  of  the 
estimated  number  of  genes  represented  in  the  library.  This  suggests  that  an  additional 
two  or  three  transcriptionally  up-regulated  and  down-regulated  genes  remain  to  be 
discovered  within  the  current  library,  based  on  the  parameters  used  in  the  current 
analysis.  However,  genes  which  are  up-  or  down-regulated  at  levels  below  the  current 
cutoffs  will  also  be  important  to  understanding  the  metabolic  activities  associated  with 
toxin  production.  These  transcripts  remain  to  be  discovered  in  the  current  dataset  and 
library.  It  should  also  be  noted  that  genes  whose  expression  is  regulated  by  post 
transcriptional  mechanisms  including  translation  will  not  be  identified  by  the  current 
microarray  analysis  and  remain  to  be  discovered  by  alternative  strategies. 

The  analysis  of  P.  multiseries  transcripts  following  induction  of  DA  synthesis 
has  identified  27  transcripts  of  interest,  twelve  up-regulated  and  fifteen  down-regulated 
transcripts.  The  further  characterization  of  these  transcripts  and  elucidation  of  their 
functional  significance  in  the  regulation  of  P.  multiseries  physiology  and  their  potential 
significance  in  toxin  production  provide  a  series  of  entry  points  to  better  understand  the 
physiology  and  biochemistry  of  P.  multiseries.  These  transcripts  may  also  be  useful  in 
ecological  field  studies  in  which  they  may  serve  as  signatures  of  toxin  production. 
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Chapter  IV 


Synthesis  and  Future  Work 
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The  identification  and  characterization  of  Pseudo-nitzschia  multiseries  cDNAs  in 
this  study  provides  an  entry  point  for  the  investigation  of  the  physiological,  functional, 
and  biochemical  significance  of  these  genes  to  P.  multiseries  biology  and  ecology. 
Screening  the  library  for  mRNA  species  which  are  up-regulated  and  down-regulated 
during  toxin  production  is  a  first  step  towards  fully  understanding  the  physiological 
pathways  that  are  associated  with  DA  production  in  P.  multiseries.  In  the  immediate 
future,  investigation  of  the  functional  role  of  each  of  these  transcripts  is  warranted. 

Studies  directed  towards  determining  the  causes  and  consequences  of  modulation  of  these 
genes  will  be  of  great  interest.  Exploration  of  the  environmental  factors  that  promote  up- 
regulation  and  down-regulation  of  these  genes  should  yield  further  insight  into  the 
biology  of  P.  multiseries.  Taken  together  these  lines  of  investigation  may  allow  the  use 
of  some  or  all  of  these  transcripts  as  markers  of  P.  multiseries  physiology  in  the  field  to 
monitor  ecologically  relevant  activities  of  P.  multiseries  such  as  toxin  production  and 
photosynthetic  activity. 

A  number  of  immediate  follow-up  studies  would  allow  more  complete 
characterization  of  the  transcripts  identified  in  this  thesis.  The  development  of  specific 
assays  for  each  transcript  of  interest,  through  the  use  of  quantitative  PCR,  RNAase 
protection  and/or  Northern  blotting  would  be  of  great  value.  In  addition  to  providing 
quantitative  confirmation  on  the  behavior  of  each  up-  or  down-regulated  species,  these 
assays  will  allow  more  extensive  sets  of  experiments  to  be  carried  out  to  quantitate  the 
modulation  of  each  mRNA  species  under  a  broad  range  of  physiological  and  biochemical 
conditions.  These  detailed  studies  may  allow  the  identification  of  mRNAs  which  are 
particularly  useful  as  early  indicators  of  the  initiation  of  toxin  production  or  the  switch  of 
P.  multiseries  cells  to  an  alternative  growth  state.  Northern  blotting  experiments  will 
have  an  additional  value.  They  may  identify  alternative  mRNA  forms  which  may  be 
differentially  regulated  due  to  alternative  promoters,  polyadenylation  or  splicing. 
Systematic  PCR  or  RNAase  protection  across  each  transcript  would  complement 
Northern  blotting  studies  by  revealing  alternative  splicing  patterns  which  involve 
sequences  too  short  to  be  detected  on  Northern  blots. 
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The  isolation  and  characterization  of  the  genomic  DNA  which  corresponds  to 
each  transcript  would  be  of  interest.  It  should  be  possible  to  define  the  promoters  and 
regulatory  elements  which  are  responsible  for  the  transcriptional  control  of  these 
sequences.  Sequence  analysis  and  functional  studies  could  permit  the  identification  of 
key  regulatory  sequences  responsible  for  the  expression  or  repression  of  the  transcripts 
we  have  identified  during  toxin  production. 

The  functional  properties  of  the  genes  we  have  identified  can  be  studied  in  a 
number  of  ways.  For  many  genes,  charcterization  by  expression  of  a  full  length  cDNA  in 
an  expression  system  such  as  the  xenopus  oocyte  microinjection  system  could  reveal 
some  important  functional  characteristics  of  the  gene.  This  approach  would  be 
particularly  appropriate  for  genes  involved  in  signaling  and  membrane  transport.  For 
genes  involved  in  biochemical  pathways  present  in  microorganisms  such  as  yeast, 
expression  in  a  cell  mutant  for  the  enzyme  likely  to  be  encoded  by  the  cDNA  may  be  a 
particularly  useful  strategy  for  characterization  of  the  enzymatic  activity  of  the  P. 
multiseries  gene  product. 

The  ability  to  introduce  genes  into  P.  multiseries  for  functional  studies  will  also 
be  of  great  interest.  Several  strategies  may  be  useful.  The  development  of  RNA 
interference  (RNAi)  technology  in  P.  multiseries  to  inhibit  expression  of  target  genes, 
would  be  particularly  useful  in  determining  the  functional  activity  of  the  genes  of 
interest.  The  ability  to  express  genes  whose  expression  levels  are  reduced  or  absent 
during  a  particular  stage  of  the  P  multiseries  life  cycle  would  also  be  of  interest.  This 
might  allow,  for  example,  the  identification  of  genes  which  initiate  the  program  of 
expression  which  leads  to  toxin  production  in  P.  multiseries.  DNA  transformation  of 
genes  into  diatoms  has  been  demonstrated  using  a  microparticle  bombardment  system 
(Apt  et  al.,  1996;  Dunahay  et  al.,  1995;  Falciatore  et  al.,  1999).  Adaptation  of  a  gene 
transfer  protocol  of  this  type  to  P.  multiseries  would  permit  experiments  of  the  type 
described  above  to  be  performed. 

The  Thalassiosira  pseudonana  and  Phaeodactylum  tricornutum  databases  have 
proved  to  be  extremely  useful  tools  in  the  characterization  of  the  Pseudo-nitzschia 
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multiseries  genes  we  have  identified.  Many  of  the  P.  multiseries  sequences  that  were 
putatively  identified  based  on  sequence  similarity  with  a  known  protein  also  matched 
sequences  in  T.  pseudonana  and  P.  tricornutum.  Most  often,  these  sequences  showed  the 
highest  degree  of  similarity  to  T  pseudonana  and  P.  tricornutum  sequences  compared  to 
sequences  from  non-diatoms.  We  have  also  identified  many  genes  which  have  no 
significant  sequence  relationship  to  any  gene  in  the  public  databases.  A  subset  of  these 
genes  show  significant  sequence  relationship  to  a  gene  in  either  the  T.  pseudonana  and  P. 
tricornutum  databases  or  both.  These  genes  should  be  considered  to  be  diatom-specific 
transcripts.  Characterization  of  the  functional  properties  of  these  transcripts  should 
illuminate  some  of  the  biological  properties  specific  to  the  diatom  family  as  well  as  the 
evolutionary  history  of  diatoms. 

The  identification  of  numerous  transcripts  that  did  not  match  any  known  proteins 
in  the  public  databases,  nor  any  entry  in  the  T.  pseudonana  and  P.  tricornutum  databases 
may  represent  novel  sequences  that  will  help  to  elucidate  unique  aspects  of  P.  multiseries 
biology,  such  as  toxin  production.  The  inactivation  by  siRNA  or  other  methods  of  these 
transcripts  in  P.  multiseries  may  illuminate  their  potentially  unique  role  in  the  biology  of 
P.  multiseries. 

Our  findings  have  potential  significance  in  the  understanding  of  photosynthesis  in 
P.  multiseries.  As  noted  in  chapters  2  and  3,  high  sequence  identity  to  known  proteins 
substantiates  the  identification  of  Contig  3,  PSN0016  as  phosphoenolpyruvate 
carboxykinase  (PCK)  in  P.  multiseries.  Up-regulation  of  this  transcript  was  noteworthy 
in  light  of  the  current  debate  about  C4  photosynthesis  in  diatoms.  In  addition,  the 
potential  identification  of  a  C4-specific  pyruvate,  orthophosphate  dikinase  (PPDK) 
suggests  the  possibility  of  a  C4  pathway  in  P.  multiseries.  C4  photosynthesis  is  thought 
to  have  evolved  in  certain  plants  as  an  adaptation  to  the  competition  of  oxygen  with 
carbon  dioxide  for  ribulose-l,5-bisphosphate  (rubisco),  a  key  enzyme  in  photosynthesis. 
This  competition  occurs  both  during  periods  of  high  productivity,  when  carbon  dioxide  is 
consumed  in  the  fixation  reactions,  altering  the  carbon  dioxide  to  oxygen  ratios  in  the 
space  around  the  cell,  and  at  high  temperatures,  when  the  affinity  of  rubisco  for  CO2 
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decreases.  Condensation  of  O2  with  rubisco  results  in  the  absence  of  fixation  of  CO2. 
Therefore,  some  plants  have  evolved  a  coping  mechanism  in  which  carbon  fixation 
involves  multiple  steps  (Leegood,  2002).  In  one  pattern  of  C4  photosynthesis, 
bicarbonate  is  fixed  to  oxaloacetate  and  transported  to  a  separate  compartment 
(oxaloacetate  may  be  reduced  to  malate  or  converted  to  aspartate  for  transfer,  and  is  then 
converted  back  into  oxaloacetate),  where  the  molecule  is  oxidized  and  decarboxylated  to 
yield  pyruvate  and  CO2  by  the  action  of  PCK.  The  carbon  dioxide  may  now  be  fixed  by 
rubisco  and  photosynthesis  proceeds,  as  in  traditional  C3  photosynthesis. 

The  diatom  debate  asks  the  question  of  whether  C4  photosynthesis  exists  in 
diatoms.  Reinfelder  et  al.  (2000)  suggest  that  C4  photosynthesis  does  exist  in  diatoms,  in 
their  study  of  Thalassiosira  weissflogii,  based  on  carbon  labeling  studies  that  show 
increased  phosphoenolpyruvate  carboxylase  activity  (which  functions  in  carbon 
acquisition  in  C4  photosynthesis)  and  increased  carbon  pools  in  the  form  of  malate 
during  low  carbon  dioxide  or  Zn- stressed  conditions.  Johnston  et  al.  (2001)  dispute  the 
conclusions  made  in  the  previous  study,  suggesting  that  further  evidence  is  needed, 
including  a  better  understanding  of  the  role  of  PCK,  before  C4  photosynthesis  can  be 
assigned  to  diatoms.  In  the  present  study,  up-regulation  of  PCK  occurred  under  highly 
productive  conditions  with  high  cell  densities;  therefore,  there  is  a  real  possibility  that 
carbon:oxygen  ratios  were  altered  in  these  experiments.  In  addition,  one  proposed 
pathway  for  DA  synthesis  suggests  that  the  precursor  units  to  DA  would  be 
biosynthesized  in  separate  compartments  within  the  cell,  which  would  fit  the  model  of  C4 
photosynthesis  (Douglas  et  al.  1992;  Ramsey  et  al.,  1998).  These  results  suggest  that 
PCK  may  play  reciprocal  roles  in  P.  multiseries,  potentially  acting  in  C4  photosynthesis 
and  DA  synthesis  by  liberating  carbon  dioxide  and  pyruvate  in  the  same  reaction.  A  role 
for  C4  carboxylation  in  DA  synthesis  has  been  suggested  in  the  past  (Bates,  1 998). 
Further  studies  into  the  functional  role  of  PCK  up-regulation  in  P.  multiseries  will  help  to 
clarify  the  role  of  C4  photosynthesis  in  diatoms.  In  addition,  localization  studies  with 
PCK  or  PPDK  would  help  to  determine  if  PCK  or  PPDK  activity  is  confined  within  a 
specific  compartment  within  the  cell. 
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The  identification  of  a  potential  amino  acid  transporter  that  is  closely  related  to 
the  neurotransmitter  symporter  family  is  intriguing.  In  plants,  GABA  neurotransmitter 
transporters  appear  to  play  a  key  role  in  signaling  and  pollen  tube  guidance  in 
Arabidopsis  (Palanivelu  et  al.,  2000, 2003).  These  findings  suggest  the  hypothesis  that 
DA  is  itself  a  signaling  molecule.  An  approach  to  expression  analysis  to  determine 
functionality  that  would  be  especially  useful  for  this  transporter  would  be  to  inject 
mRNA  into  Xenopus  oocytes,  where  the  oocytes  will  direct  synthesis  of  the  protein  so 
that  transport  function  or  ligand-binding  properties  can  be  assessed.  In  addition, 
antibodies  against  specific  segments  of  the  transport  protein  would  help  to  determine 
which  areas  are  exposed  to  one  side  or  the  other  of  the  membrane. 

The  availability  of  P.  multiseries  cDNA  microarray  technology  developed  in  this 
thesis  offers  the  ability  to  continue  expression  studies  to  address  other  questions  relating 
to  DA  synthesis  and  P.  multiseries  biology.  For  example,  this  study  compared  gene 
expression  across  axenic  vs.  non-axenic  cultures  in  order  to  target  Pseudo-nitzschia 
genes  that  are  specifically  related  to  DA  production,  and  to  reduce  the  likelihood  of 
amplifying  bacterial  genes  that  may  enhance  toxin  production.  However,  further  analysis 
of  this  dataset  and  future  experiments  utilizing  the  knowledge  that  DA  production  is 
enhanced  by  bacteria  would  allow  us  to  select  for  genes  that  are  up-regulated  in  non- 
axenic  vs.  axenic  cultures  in  order  to  help  understand  what  role  the  bacteria  have  in 
enhancing  DA  production.  Other  experiments  may  focus  on  the  effects  of  nutrient 
limitation,  such  as  silicon  limitation,  which  appears  to  enhance  DA  production.  Another 
useful  application  of  this  technology  will  be  to  investigate  gene  expression  in  other  P. 
multiseries  species,  including  both  toxic  and  non-toxic  strains. 

While  the  initial  analysis  of  this  dataset  has  successfully  fulfilled  the  original 
goals  of  this  project,  the  data  generated  from  the  microarray  experiments  will  continue  to 
be  useful  as  they  are  annotated  and  analyzed  further.  For  example,  further  annotating  the 
down-regulated  genes  should  help  to  broaden  the  picture  and  allow  further  hypotheses  to 
be  generated  that  will  guide  future  research.  The  assessment  of  genes  which  are  up  or 
down-regulated  at  levels  below  the  current  cutoffs  will  also  be  of  importance  in 
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completing  the  picture  of  gene  expression  changes  during  toxin  production.  In  addition,  it 
will  be  interesting  to  search  the  dataset  for  specific  genes  that  are  of  potential  interest, 
such  as  glutamate  dehydrogenase.  Ramsey  et  al.  (1998)  found  that  the  labeling  pattern  of 
carbon  incorporation  into  DA  was  consistent  with  a  biosynthetic  pathway  via  alpha- 
ketoglutarate.  Glutamate  dehydrogenase  catalyzes  the  reversible  reaction  between 
glutamate  and  alpha-ketoglutarate.  Glutamate  dehydrogenase  expression  data  showed 
that  it  was  up-regulated,  but  fell  under  the  cut-off  criteria  utilized  in  the  initial  analysis. 
Up-regulation  of  this  enzyme  in  correlation  with  DA  synthesis  supports  Ramsey’s  model 
and  further  investigation  into  the  functional  role  of  this  enzyme  in  DA  biosynthesis 
should  prove  to  be  informative. 

As  is  the  nature  of  microarray  experiments,  the  initial  analysis  of  any  one  dataset 
is  a  first  step  into  a  set  of  data  that  will  continue  to  offer  useful  information.  The  results 
reported  here  will  help  guide  future  experiments  and  continue  to  facilitate  our 
understanding  of  the  biochemical  pathways  in  P.  multiseries  and  other  diatoms.  In 
addition,  this  study  demonstrates  the  potential  of  applying  cDNA  microarray  technology 
to  the  identification  of  transcriptionally  regulated  genes  in  P.  multiseries  and  other 
marine  diatoms  and  offers  a  useful  resource  to  the  harmful  algal  bloom  community. 
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